How the dataset is built and what it does not claim
The China Semiconductor Tooling Talent Atlas is an editorial evidence product. This page describes what the atlas measures, what it deliberately does not measure, and the contract this product keeps with the public sources behind every row.
What the atlas measures
The atlas catalogues public records for semiconductor-equipment talent in mainland PRC. Each row is anchored to a public source and tagged by tooling segment, evidence type, and confidence. The catalogue is built to answer a specific kind of question: where, and through what kind of record, does the talent layer behind a tooling segment become visible at all?
Concretely, the dataset combines four kinds of artefact:
- Filings and corporate disclosures (annual reports, prospectuses).
- Government and institutional records (policy notices, shortage catalogues, university and key-lab pages, industrial-park directories).
- Analytical proxies derived from public sources (job-posting language, research-output indicators, expert secondary writing).
- Taxonomy scaffolding that organises the four tool families and their role and capability vocabulary.
Out of scope
The atlas avoids labour-market census, capability-index, and ranking claims. Several claims remain out of scope:
- Headcount estimates of segment-specific workforces.
- Yield-learning depth, process-window judgement, or chamber and calibration know-how.
- Comparative readiness, market share, or competitive position.
- Forecasts of future capability, hiring, or shortage.
- Any individual-level personal data. The atlas operates at firm, institution, city, segment, capability, and discipline levels only.
A row that appears to support one of those claims is being read past its evidence. The explorer’s audit panel exposes the underlying source for each row.
Evidence types
Every observation carries a single evidence_type value. These values roll up into three analytical groups: direct public record (the strongest), analytical proxy (supporting context), and taxonomy scaffold (navigational only). Counts below reflect the current beta dataset.
| Label | Raw value | Group | Definition | Rows |
|---|---|---|---|---|
| Industry presence | industry_presence | Direct public record | Firm, industrial park, supplier cluster, or manufacturing ecosystem. | 77 |
| Institutional capacity | institutional_capacity | Direct public record | Degree authorization, school, lab, research centre, or key discipline. | 14 |
| Official policy | official_policy | Direct public record | Policy document or government announcement. | 3 |
| Shortage signal | shortage_signal | Direct public record | Shortage list, talent catalogue, recruitment policy, or labour-market signal. | 17 |
| Expert secondary source | expert_secondary | Analytical proxy | Think-tank, consulting, industry report, or academic synthesis. | 12 |
| Job posting proxy | job_posting_proxy | Analytical proxy | Role demand inferred from job postings. | 4 |
| Research output proxy | research_output_proxy | Analytical proxy | Papers, patents, grants, standards, or labs. | 16 |
| Manual taxonomy scaffold | manual_inference | Taxonomy scaffold | Analyst-coded inference linking taxonomy to evidence. Scaffolding, not real-world activity. | 65 |
The explorer defaults unlisted evidence types to the analytical-proxy group.
Confidence levels
Confidence describes the link between source and observation, not the importance of the observation. A row can be highly relevant editorially while only carrying medium confidence.
| Label | Raw value | Definition |
|---|---|---|
| High | high | The source directly supports the observation. |
| Medium | medium | The source supports the observation through a reasonable proxy. |
| Low | low | The source is relevant but indirect, promotional, outdated, or incomplete. |
Evidence strength versus capability
Public records expose what firms, institutions, and government bodies choose to disclose. They do not expose chamber recovery, customer-ramp judgement, calibration habits, or the quiet learning that makes a tool work in a fab. A segment can look strong because its firms file thoroughly, or weak because its firms publish less. Neither pattern measures engineering depth.
For that reason the atlas avoids composite scores. Each tier of evidence is reported on its own terms. When the homepage refers to “visibility,” it means disclosure coverage in current public records, not engineering capability.
Update cadence
- The atlas is on a manual review cadence. There is no automated scraping, and no automated polling of external services.
- Every source row records a
publication_dateand anaccess_date; for the recency of any specific claim, refer to the source row via the explorer’s audit panel. - Observations remain in the beta dataset until they are manually verified against the primary source. The explorer header marks this status; the underlying CSV is treated as staging.
- New rows are added in batches, not continuously. Expect the published snapshot to lag the original filings by weeks rather than days.
No individual-level mapping
The atlas does not, and will not, hold individual-level personal data. Observations are anchored to one of the following entity types: firm, institution, industrial park, city, segment, capability, discipline, or role family. Named individuals only appear when they are unavoidable bibliographic attribution in a source title. They are never a subject of analysis.
This editorial rule keeps the product focused on public evidence structure. Profiling specific engineers would turn it into a different kind of product.
How to cite or use the atlas responsibly
When citing or quoting from the atlas, follow these conventions:
- Cite the atlas as the editorial source and the underlying public record as the primary source. Do not present atlas counts as workforce totals.
- For any specific claim, retrieve the source via the explorer’s audit panel and quote the source’s own title, publisher, publication date, and access date.
- Preserve evidence-tier language: a row from the analytical-proxy group should not be paraphrased as a direct measurement.
- When publishing derivative analysis, flag the same caveats this page lists, particularly the absence of tacit production know-how from the public record.
Suggested citation pattern
China Semiconductor Tooling Talent Atlas (beta dataset), entry for <city / firm / segment>, evidence type <raw value>, source: <title, publisher, publication date, access date>.