Developing New VPN Experiments to Better Inform Circumvention Tactics

The following blog post was written by ICFP fellow Ain Ghazal as a summary of their work on Censorship Resistance Systems in global VPN infrastructure.
Thu, 2023-06-01 18:05

Adoption and utilization of VPN technologies has increased over the last decade as more and more Internet users rely on these tools to guarantee their own personal “right to whisper”. Yet barriers to entry and successful VPN utilization continue to persist and evolve across the globe. As established by Chinmayi S K, a prior fellow in OTF’s Information Controls Fellowship Program (ICFP), adoption of these essential technologies has not been uniform around the world as obstacles in usability and outreach continue to suppress VPN adoption levels in areas of conflict and crackdowns. And even for individuals who are able to adopt VPNs, a range of new barriers to successful VPN utilization has emerged with government censors becoming increasingly sophisticated in their suppression tactics. In turn, many VPN users today find themselves increasingly targeted in countries such as Uganda, Pakistan, India, Russia, and Iran

Given this uncertain state of play, knowledge about the status of VPN censorship – and how it relates to one’s own use and safety – is crucial. An accurate map of the situation worldwide, as well as a better understanding of censor capabilities by region/state, must be established. Accordingly, I spent my 2022 ICFP fellowship paired with the Open Observatory of Network Interference (OONI) working to quantify interference on VPN connections in an effort to improve existing measurements and increase the reliability of Censorship Resistance Systems in global VPN infrastructure. I discuss my work and findings from the fellowship below in the hopes that these tools and methodology will contribute to the creation of even more resilient circumvention tools in the future.

Research Overview

For several years the OONI community has requested VPN experiments – first as the need to detect blocks at the level of a given protocol, then as requests to test the effectiveness of different VPN products across countries. The research questions behind the design of such experiments range from simple to complex, but they all ultimately boil down to a fundamental, user-centric question: can a particular VPN product be used at a given time and network? My research produced two simple building blocks to get the Internet freedom community one step closer to partial answers.

In collaboration with OONI, I added two new network experiments to the OONI Probe codebase that allow probes to connect to a VPN endpoint. The two experiments add basic OpenVPN and Wireguard support, two prevalent protocols offered by many commercial VPN providers with open-source code on the client side (the OpenVPN case also allows for the measurement of connectivity to obfuscated bridges using obfs4). Using these building blocks, future experiments can be added that incorporate specific knowledge about provider-specific configuration mechanisms.

During the design phase, the OONI team and I worked to conceptualize the metrics that a generalized VPN experiment would need to provide and the constraints that any implementation should satisfy. We then iterated on the implementation and performed a round of validation experiments from a set of vantage points to check the behavior of the experimental probes and attempt to ascertain the statistical properties of a series of contiguous measurements. The resulting data will allow us to infer facts about possible interference patterns and mechanisms in different networks. 

Our final efforts centered on the design of algorithms for processing, filtering, validation, and annotation of raw measurement data. We worked to address the question of how the OONI measurement infrastructure could be used in novel ways to receive information flows originating in sources other than OONI Probes as a part of the process that converts individual data points and aggregates knowledge about possible censorship capabilities. In so doing, we now suggest a standards-compliant way in which interference data from actual VPN applications can enrich OONI observations via either direct submission or VPN provider aggregation.

Building VPN Capabilities into OONI Probe

The OONI Probe is a well-known Internet monitoring tool used to collect data about censorship and network interference around the world. It consists of a software client that can be run on desktop or mobile devices, as well as a suite of experiments designed to measure connectivity and gather information about network operations that can be used to detect censorship. By running these tests, the OONI Probe can detect when network operators are blocking or tampering with online traffic – providing valuable data to researchers, journalists, and human rights advocates. The OONI Probe is open source and designed to be accessible and easy to use.

Protocol Support

A censor typically blocks circumvention tools during one (or more) of four distinct stages: access to the tool itself, blocking of APIs, connecting to the tunnel, and maintaining a usable VPN connection. Measuring the first two stages is already possible via OONI’s Web Connectivity experiment. Measuring the latter two stages, however, is a bit more complex. Assessing VPN tunnel initiation requires connecting to actual endpoints in a way that behaves as much as possible as the reference implementation – and defining a usable connection is even more nuanced. These two measurements were thus our team’s areas of focus.

From the start, our team realized that adding Wireguard support would be more straightforward than OpenVPN due to several design choices such as avoiding linking against external libraries and not requiring administrative privileges. In the wake of initial experimentation, we decided to opt for a white-room implementation of a small subset of the OpenVPN protocol that we could integrate easily from the OONI Probes. With support from OTF’s Red Team Lab, the initial minivpn implementation received a security audit by 7asecurity (which is now public).

From a global user standpoint, we also wanted to develop support for at least one Fully Encrypted Protocol, given that plain OpenVPN or Wireguard will not work or will be quickly blocked in specific regional contexts – such as beyond the Great Firewall of China (GFW). We selected obfs4 for this purpose because it is widely used for Tor bridges and offered by our two collaborating providers.

Measurement Metrics

As mentioned, the existing primitives already offered by the OONI Probe Engine make it possible for users to measure the first two stages of censor blocking activity by detecting TCP or DNS interference on (1) the distribution channel on the web, or (2) the APIs commonly used for endpoint discovery and initialization.

When measuring the third stage, tunnel initialization, OpenVPN differs from Wireguard because Wireguard is a handshake-less protocol based on UDP. Our efforts therefore sought to register timestamps for each stage of the OpenVPN handshake in order to be able to gather statistical information about non-successful handshakes. Regarding the fourth stage, what constitutes a “usable tunnel” is open to discussion – but it was nonetheless clear that we wanted to test the connection for a period of time because the tunnel may get interrupted or degraded after an initial positive start.

A final design constraint for our work was our desire to standardize a test that did not take very long for regular OONI probes to run – but which still provided sufficient information, even when sampling a given network in a sparse manner. After some discussion, the team decided to record a few ICMP echo replies received over the tunnel to verify that the gateway was routing our traffic (and then obtain a measure of latency); and then fetch a small number of webpages to retrieve a few data points about the usable download speed. Notably, these parameters are fixed by default, but they also can be parametrized when using miniooni to enable different types of custom experiments to be performed.

Real-world Providers

A key point in the experimental design is that our efforts were primarily focused on measuring “real-world” infrastructure. A tunnel against a random, newly configured endpoint can be used for five minutes or five weeks – and in the long run censors may be able to detect encrypted traffic when several conditions are met (this could be when many people are using a given VPN gateway, but also depends on where the gateway is located). This means that factors like the freshness of an endpoint, the reputation of its address block, and the relative popularity will likely play a factor in predicting the likelihood of a given endpoint being blocked by a given censor. As a result, we were interested in empirically assessing the importance of each of these factors after reaching a steady influx of measurements.

Yet even in the calibration phase we also needed to consider two key design implications. First, credentials for a real-world VPN system need to be distributed in a way that can protect against abuse. Second, when conducting our research we needed to coordinate with partnering VPN providers to avoid inadvertently making resource enumeration easier. Throughout the duration of my ICFP fellowship, I collaborated with Tunnelbear and RiseupVPN to gather a few months of data from a number of vantage points. This small dataset was used to calibrate the experiment results and perform preliminary analysis.

Conducting VPN Research with OONI Data

Running the Experimental VPN Probe

The events of our measuring flow are described in the diagram below. This particular experiment uses oonirun descriptor files.

1. A researcher publishes the spec for a particular experiment (example). This experiment descriptor is a simple JSON file published in GitHub, and it contains the address for a VPN endpoint as well as valid credentials to connect to it. The researcher then distributes the URL for the experiment.

2. A different experiment descriptor can also be used to probe regular web connectivity to the API endpoints for the VPN provider. This can uncover attempts to block access to the VPN app initialization via TCP or DNS blocks over the API endpoints. In the future, this will be the initial phase of the same provider-based test (like today is done at the RiseupVPN experiment).

3. The miniooni probe attempts to establish the VPN tunnel with the credentials passed in Step One. For OpenVPN, this can be a regular handshake or a handshake over an obfuscated tunnel. In the case of OpenVPN, timing information is recorded about each step in the handshake. After the handshake, three ICMP Pings are measured and then a webpage is fetched to verify that the tunnel is in a usable state.

4. The probe sends experiment results to the OONI collector, which will be later made available on the API (example).

5. A data scientist runs analysis pipelines over the OONI Data and obtains new insight.

Figure 1: An example of the test keys collected by the OpenVPN experiment.

A report with the findings from the calibration runs, and longer runs of upgraded versions of the VPN experiments, will be submitted for publication in a peer-reviewed journal in the coming months.

Sample questions specific experiments tried to answer include:

  • What is the failure rate per endpoint, source ASN, and protocol? How does this failure rate evolve over time?
  • When an obfs4 endpoint is blocked from a given ASN, will the tunnel be blocked if an attempt is made to connect to a transparent TCP proxy in a different IP that forwards the traffic to the initial one?
  • Is there evidence for the distribution of fully obfuscated traffic with different statistical properties than the distribution from a regular tunnel? 

User Note: Until the corresponding pull request has been merged on GitHub, a convenience container has been created for a custom miniooni build. This container may be useful for anyone who wants to replicate these experiments prior to the merge. Be advised, however, that this is an unofficial build and the subsequent user is responsible for keeping it up to date. Usage of the tool should be identical to the miniooni binary.

Preliminary Findings 

While trying to answer whether a significant classification of censorship tactics is viable (or meaningful) on a country-by-country basis, it must be noted that any claim above the network boundary should be treated very carefully.

In this regard, our statistical analysis produced some preliminary results that, for the time being, should be taken with a pinch of salt. This is true both because the coverage from the vantage points used for the experiments cannot be considered representative or exhaustive, and also because we would like to ensure that the dataset backing any of the assertions is curated to be as robust as possible and is not the side effect of an experimental artifact due to the experimental status of the probes to date. 

With the caveat that these insights still need to be fully confirmed by a more steady measurement inflow, I can share the following preliminary highlights from my research:

  • The methods, or combination of methods, used to block access to configuration APIs offered by a VPN provider seem consistent for the same provider across the networks probed in the same country during the measured time frame (for example, serving a block page vs. DNS interference or TCP blocking). Some providers, however, are more affected than others by blocks from the sampled networks (see graphs below for a comparison of probing the APIs for three VPN providers).
  • The residual distribution of failure modes seems to suggest distinctive signatures across countries. Further work is needed, however, to ensure this indication is not an experimental artifact. 
  • When analyzing the stage at which the VPN handshake fails, a given bias in the outcome distribution might have the country as a meaningful predictor. Once again, however, an experimental artifact cannot be discarded at this stage.
  • In all the cases where a VPN obfs4 bridge was detected as blocked for a given provider, experimental data suggest an IP block against the endpoint and not a short-term dynamic classification of the encrypted flow.

Results of OONI Web Connectivity tests against the APIs for three VPN providers, along with the statistical distribution for the different stages of the OpenVPN handshake, are provided below for further review.

Figure 2: Results of Web Connectivity tests against the API for Mozilla VPN.

Figure 3: Results of Web Connectivity tests against the API for TunnelBear VPN.

Figure 4: Results of Web Connectivity tests against the API for RiseupVPN. 

Figure 5: Statistical distribution for the different stages of the OpenVPN handshake, aggregated across all transport modes. The horizontal axis shows the total bootstrap time (in seconds); the measurement count is displayed on the vertical axis. Stage 0 is the initial state, and stage 8 indicates the completion of a successful OpenVPN connection.

Musings on OONI Data, Providers, and the Bi-Directionality of Information Flows 

Once all components for ingesting measurements are in place, the real fun begins. The value of doing collaborative research with Open Data gets tangible when you can aggregate and query the dataset, and act upon the knowledge you extract from it. As an example, a VPN provider can fetch relevant data to integrate observations coming from OONI experiments with its own knowledge base, possibly correlating them with their time series for country-level and per-protocol baseline and trends.

The main driver of my research was to offer a new capability to the broader VPN circumvention community, including not only researchers and protocol developers – but also providers. Going forward, a small provider now does not need to set its own monitoring infrastructure, and can consume data from the public OONI API to learn if a given VPN/obfuscation protocol is blocked from a certain network. This allows for more targeted and therefore less noisy tunnel attempts, among other benefits. If it is known beforehand that a protocol will be blocked in a certain restrictive network, the VPN client can proactively choose to attempt only more stealthy protocols in an effort to avoid being flagged by a censor that is trying to detect connections using known protocols and block lists.

But information can flow in the other direction, too. Working to define best practices for sharing data originated in real-world VPN apps is a long-standing question in the community, while the need to automate parts of the analysis as close to real-time as possible is often highlighted during moments of crisis – such as the massive blocking by the GFW or the Iranian Firewall in October 2022. Together with members of Censored Planet and Outline, we organized a panel at the 2023 OTF Summit in which we tried to understand various needs from the point of view of different stakeholders, including users, media, providers, developers, and policymakers. The findings from the VPN Community Initiative in 2022 also shed some light on how part of these challenges can be mapped out.

Motivated by the insights from these conversations, and suggestions from members of the Outline team, I authored a proposal for a new OONI data format for external VPN data inspired by the Network Error Logging mechanism (NEL). NEL is slowly gaining traction for browsers to report failures to designated endpoints that are communicated via a special HTTP header. The idea is that a universal reporting mechanism can be defined to enable circumvention tools to report directly to OONI endpoints – but acknowledging also that ultimately providers are the natural agent to aggregate and submit failure reports on behalf of their direct users.

Notably, exposing failure rates for a diversity of infrastructure endpoints and techniques poses problems for which there are no obvious solutions, like how to limit resource enumeration, preserve user privacy, or trust data sources for which there is no practical provenance. Deep technical and tactical discussion is needed to address these and other concerns, but even a basic implementation will be a good point to start exploring its usefulness.

Looking to the Future

The oonirun abstraction makes it possible today to run the experimental openvpn and wireguard protocol probes and keep iterating over them. With some known limitations, these new nettests have already been demonstrated to be a useful foundation to perform custom, targeted experiments.

The pathway until their adoption by the main OONI Probes starts with merging these patches, but two key pieces remain outstanding. First, existing code (鳥居) needs to be shaped into a proper experiment coordinator, including VPN credential distribution to probe a known subset of the infrastructure of our partnering VPN providers. Second, there is also a strong desire to keep expanding the OpenVPN functionality covered by minivpn (data channel encryption, for instance), increase the number of obfuscation proxies that can be probed out of the box (for openvpn and wireguard, but also potentially for different protocols), and run an NDT Speed Test over the configured VPN tunnel.

Shepherding the current probes to the proverbial finish line and securing mass adoption by the wonderful community that runs and contributes to OONI Probe will allow everyone to see a more complete map of what is blocked and which evasion techniques work best. Critically, these efforts will also allow for users to start exploring quantitatively other open questions, like the extent of server-side blocking – or even where/how VPN throttling may be taking place. Indeed, there will likely be many new insights waiting . . . at the other side of the tunnel.

About the program: OTF’s Information Controls Fellowship Program (ICFP) supports examination into how governments in countries, regions, or areas of OTF’s core focus are restricting the free flow of information, impeding access to the open Internet, and implementing censorship mechanisms, thereby threatening the ability of global citizens to exercise basic human rights and democracy. The program supports fellows to work within host organizations that are established centers of expertise by offering competitively paid fellowships for three, six, nine, or twelve months in duration.