Source contractMethodology · Beta dataset v1

How the dataset is built and what it does not claim

The China Semiconductor Tooling Talent Atlas is an editorial evidence product. This page describes what the atlas measures, what it deliberately does not measure, and the contract this product keeps with the public sources behind every row.

01

What the atlas measures

The atlas catalogues public records for semiconductor-equipment talent in mainland PRC. Each row is anchored to a public source and tagged by tooling segment, evidence type, and confidence. The catalogue is built to answer a specific kind of question: where, and through what kind of record, does the talent layer behind a tooling segment become visible at all?

Concretely, the dataset combines four kinds of artefact:

  • Filings and corporate disclosures (annual reports, prospectuses).
  • Government and institutional records (policy notices, shortage catalogues, university and key-lab pages, industrial-park directories).
  • Analytical proxies derived from public sources (job-posting language, research-output indicators, expert secondary writing).
  • Taxonomy scaffolding that organises the four tool families and their role and capability vocabulary.
02

Out of scope

The atlas avoids labour-market census, capability-index, and ranking claims. Several claims remain out of scope:

  • Headcount estimates of segment-specific workforces.
  • Yield-learning depth, process-window judgement, or chamber and calibration know-how.
  • Comparative readiness, market share, or competitive position.
  • Forecasts of future capability, hiring, or shortage.
  • Any individual-level personal data. The atlas operates at firm, institution, city, segment, capability, and discipline levels only.

A row that appears to support one of those claims is being read past its evidence. The explorer’s audit panel exposes the underlying source for each row.

03

Evidence types

Every observation carries a single evidence_type value. These values roll up into three analytical groups: direct public record (the strongest), analytical proxy (supporting context), and taxonomy scaffold (navigational only). Counts below reflect the current beta dataset.

LabelRaw valueGroupDefinitionRows
Industry presenceindustry_presenceDirect public recordFirm, industrial park, supplier cluster, or manufacturing ecosystem.77
Institutional capacityinstitutional_capacityDirect public recordDegree authorization, school, lab, research centre, or key discipline.14
Official policyofficial_policyDirect public recordPolicy document or government announcement.3
Shortage signalshortage_signalDirect public recordShortage list, talent catalogue, recruitment policy, or labour-market signal.17
Expert secondary sourceexpert_secondaryAnalytical proxyThink-tank, consulting, industry report, or academic synthesis.12
Job posting proxyjob_posting_proxyAnalytical proxyRole demand inferred from job postings.4
Research output proxyresearch_output_proxyAnalytical proxyPapers, patents, grants, standards, or labs.16
Manual taxonomy scaffoldmanual_inferenceTaxonomy scaffoldAnalyst-coded inference linking taxonomy to evidence. Scaffolding, not real-world activity.65

The explorer defaults unlisted evidence types to the analytical-proxy group.

04

Confidence levels

Confidence describes the link between source and observation, not the importance of the observation. A row can be highly relevant editorially while only carrying medium confidence.

LabelRaw valueDefinition
HighhighThe source directly supports the observation.
MediummediumThe source supports the observation through a reasonable proxy.
LowlowThe source is relevant but indirect, promotional, outdated, or incomplete.
05

Evidence strength versus capability

Public records expose what firms, institutions, and government bodies choose to disclose. They do not expose chamber recovery, customer-ramp judgement, calibration habits, or the quiet learning that makes a tool work in a fab. A segment can look strong because its firms file thoroughly, or weak because its firms publish less. Neither pattern measures engineering depth.

For that reason the atlas avoids composite scores. Each tier of evidence is reported on its own terms. When the homepage refers to “visibility,” it means disclosure coverage in current public records, not engineering capability.

06

Update cadence

  • The atlas is on a manual review cadence. There is no automated scraping, and no automated polling of external services.
  • Every source row records a publication_date and an access_date; for the recency of any specific claim, refer to the source row via the explorer’s audit panel.
  • Observations remain in the beta dataset until they are manually verified against the primary source. The explorer header marks this status; the underlying CSV is treated as staging.
  • New rows are added in batches, not continuously. Expect the published snapshot to lag the original filings by weeks rather than days.
07

No individual-level mapping

The atlas does not, and will not, hold individual-level personal data. Observations are anchored to one of the following entity types: firm, institution, industrial park, city, segment, capability, discipline, or role family. Named individuals only appear when they are unavoidable bibliographic attribution in a source title. They are never a subject of analysis.

This editorial rule keeps the product focused on public evidence structure. Profiling specific engineers would turn it into a different kind of product.

08

How to cite or use the atlas responsibly

When citing or quoting from the atlas, follow these conventions:

  • Cite the atlas as the editorial source and the underlying public record as the primary source. Do not present atlas counts as workforce totals.
  • For any specific claim, retrieve the source via the explorer’s audit panel and quote the source’s own title, publisher, publication date, and access date.
  • Preserve evidence-tier language: a row from the analytical-proxy group should not be paraphrased as a direct measurement.
  • When publishing derivative analysis, flag the same caveats this page lists, particularly the absence of tacit production know-how from the public record.

Suggested citation pattern

China Semiconductor Tooling Talent Atlas (beta dataset), entry for <city / firm / segment>, evidence type <raw value>, source: <title, publisher, publication date, access date>.