Dissecting the Shanghai Police Data Leak

Sifting through rumors, known facts, and how the cloud ties into it all

Jul 10, 2022

[NOTE: This article was updated on 7/14/2022 to reflect recent developments regarding the Shanghai government’s investigation of the leak, as well as an internal investigation at Alibaba.]

This is the first article in a two-part series analyzing the recently publicized leak of a Chinese public security database.

The debut of Root Access follows what may be the largest recorded data leak of all time.

Last weekend, word spread about the theft of nearly one billion records from the “Shanghai National Police Database.” The connection to the cloud? The data appears to have been hosted on a private Alibaba Cloud server.

As part one of this series, this week’s article provides an overview of the data leak and its possible causes. The next article in this series will take a larger look at the usage of cloud technology in China’s public security apparatus as a whole.

What We Know About the Data Leak

Note: Much of the discussion and reporting around this incident relies on speculation, and I have cited sources accordingly. As such, I plan to update this article as further details solidify.

How the Story of the Leak Went Public

Public awareness of the leak began on July 3 when Changpeng Zhao, founder and CEO of the cryptocurrency exchange Binance, announced that the company had discovered “1 billion resident records for [sale] [on] the dark web” originating from an unidentified country in Asia.

One day later, Kendra Schaefer of consulting firm Trivium China provided more clarification through a Twitter thread that quoted Zhao’s original tweet.

Kendra Schaefer 凯娜 @kendraschaefer

If you're not following this, you should be: word on the social media street is that China's police force (MPS - Shanghai) database was hacked, with the personal information and case records of 1 billion citizens, and the records are for sale on Telegram - 23TB of data. 1/7

CZ 🔶 Binance @cz_binance

Our threat intelligence detected 1 billion resident records for sell in the dark web, including name, address, national id, mobile, police and medical records from one asian country. Likely due to a bug in an Elastic Search deployment by a gov agency. This has impact on ...

Reportedly, a police database in Shanghai had been hacked, and 1 billion records with citizens’ data1 was for sale on Telegram. The source of the data was possibly China’s Ministry of Public Security (MPS). A user named ChinaDan had put the data up for sale on BreachForums, a platform for sharing and selling hacked data. They also included a sample of about 750,000 records.

Below are screenshots of the BreachForums post and sample data.

Tuomas Lin Li @TuomasLinLi

@lurenbian 刚听说这事还半信半疑，不料竟然是真的。只能做好自我防范了。

How the Data Was Likely Accessed

On July 6, the Wall Street Journal dropped a bombshell when they stated that the data was likely accessed through a publicly accessible dashboard without password protection.2

According to a technical report from the data leak monitoring platform LeakIX, the dashboard in question was an unprotected Kibana instance — a user interface for viewing and editing Elasticsearch data.

The instance ran on the default Kibana port and was exposed via the default Kibana endpoint used by Alibaba for Elasticsearch deployment on public networks. With Alibaba Cloud, deployment to a public network is the default option. Although Alibaba provides a default username and password for Elasticsearch deployments, the service used an older version of Elasticsearch without default password protection.

Cybersecurity researcher Vinny Troia speculated that this Kibana instance may have been set up as an intentional unprotected back door into the data to be used by a small number of people. Regardless of why it was set up, the configuration and deployment of this Kibana instance was extremely careless and demonstrates either a lack of proper security knowledge or simply an unwillingness to follow it.

While many of these factors appear to be linked to poor engineering practices on behalf of the users rather than Alibaba, subsequent investigation has uncovered some questionable practices regarding Alibaba’s deployment of the database.

Karen Hao 郝珂灵 @_KarenHao

Experts say the database, hosted on Alibaba Cloud, also suffered several other security problems—part of a pattern that matched 13 more databases hosted by the company. Authorities have now called in Alibaba executives and an internal investigation is undergoing, employees say.

One issue in particular concerns an expired security certificate for the database, which indicates that the database had not been properly maintained for years. This certificate was also shared with thirteen other databases.

Karen Hao 郝珂灵 @_KarenHao

Cybersecurity experts also found that the database was using an expired security certificate, a unique digital identifier used to encrypt web traffic that has become standard practice. Alibaba deployed it in September 2017 but hadn't renewed it since September 2018.

When the Leak Occurred

The WSJ article cites Bob Diachenko of cybersecurity firm SecurityDiscovery as claiming that the police database had been exposed for over a year, from April 2021 through mid-June 2022. Diachenko recently discussed the leak with CNN:

In mid-June, [Diachenko’s] company detected that the database was attacked by an unknown malicious actor, who destroyed and copied the data and left a ransom note demanding 10 bitcoin for its recovery, Diachenko said.
It is not clear if this was the work of the same person who advertised the sale of the database information last week.
By July 1, the ransom note had disappeared, according to Diachenko, but only 7 gigabytes (GB) of data was available -- instead of the 23 TB originally advertised.
Diachenko said it suggested the ransom had been resolved, but the database owners had continued to use the exposed database for storing, until it was shut down over the weekend.

What the Data Contains

What do we know about the data stored in the police database?

According to Diachenko, the size of the vulnerable data cluster was 26.4 terabytes. Many media outlets claim that the data contains the records of 1 billion people, and ChinaDan’s original post on BreachForums claimed that it encompassed billions of Chinese citizens. Many media articles cite the figure of 1 billion, although they tend to hedge this estimate with ambiguous language.3

Some have analyzed the 750,000 records in the purported sample shared by ChinaDan.

According to CNN, the sample data spans nearly two decades, from 2001 to 2019. A small amount of the sample entries have been verified.4

Correlating Diachenko’s suggestion that the data ransom was resolved, BreachForums posted on July 6 that the data was no longer being sold and that all related posts had been deleted. The post was written in Chinese with an English version provided for “curious English users.”

Interestingly, the Chinese and English phrasing used in the post differ in one key regard. The Chinese version states that the data was sold (“数据已经出售完成”), whereas the English version phrases it more ambiguously: “The data is no longer being sold.”

China’s Reaction to the Leak

China’s government and media have not yet officially commented on the Shanghai police data leak. Meanwhile, public discussion of the leak has been muffled, with Weibo blocking users from searching for the topic.

This is possibly due to a combination of factors, such as the speculative nature of many details surrounding the leak, the scale of the leak, and China’s strong stance against spreading rumors online for the preservation of public order5.

While posts on the leak do not appear to be searchable on Weibo, several articles discussing the incident can be found on WeChat. In addition, the autocomplete suggestions that appear when terms such as “Shanghai police” are inputted suggest that many users on Weibo and WeChat are searching for information about the leak.

On July 14th, the Wall Street Journal reported that executives from Alibaba Cloud — including its VP Chen Xuesong — had been called in to discuss the matter with Shanghai authorities.

While it’s uncertain how this leak will ultimately be handled, China has recently tightened legal protections and regulations for personal data as part of its Personal Information Protection Law, which also applies to government bodies.

How Alibaba Is Reacting to the Leak

As of July 14, Alibaba is conducting an internal investigation into the incident, according to the Wall Street Journal:

Senior managers from Alibaba and its cloud unit gathered virtually to formulate an emergency response on July 1, after an anonymous seller posted an advertisement for the data and provided a sample of it in a cybercrime forum, according to people briefed on the meeting…
Since the theft was discovered, Alibaba engineers have temporarily disabled all access to the breached database and have begun inspecting related code, some employees familiar with the response said. The reasons for the breach haven’t yet been determined, they said…
As the investigation continued, Alibaba Cloud ordered staff to review details such as the database architecture and configurations in contracts with key clients, especially those with dedicated private cloud resources such as government agencies and financial institutions, according to employees familiar with the matter and a cloud customer.

The second part of this series will focus on the role of cloud technology in China’s public security system.

According to the Wall Street Journal, the records contain citizens’ “names, government ID numbers, phone numbers and incident reports.”

Changpeng Zhao had previously suggested two possible causes: 1) a bug in the system’s Elasticsearch deployment, and 2) a blog post that leaked database access credentials. His first guess was not far off the mark, as the dashboard in question, Kibana, is a UI for viewing Elasticsearch data. However, the cause of the leak appears to be a lack of proper security rather than a bug.

The WSJ says that the data covers “as many as a billion individuals,” and that one file contains nearly 970 million rows, “which suggest it includes details on just as many people, assuming no duplicate entries” (emphasis mine.) CNN says that a downloaded index of the database “appears to contain information on nearly 970 million Chinese citizens.”

In Australia, ABC contacted 20 Chinese individuals included in the data set and also noted that the data of Australian citizens was present. Karen Hao of the WSJ stated that five people verified that their listed details were correct.

Check out China’s Cybersecurity Law for more details regarding this last point. (Original and English translation)

Root Access