SEC XBRL Filing Format: Engineer's Guide to Reading 10-K Data 2026

SEC XBRL Filing Format: Engineer's Guide to Reading 10-K Data 2026

When I started building FinanceTrackDaily as an aggregator over SEC EDGAR, I assumed reading a public company's annual report would be straightforward β€” open the 10-K, scrape the tables, store the numbers. It is not. The SEC publishes the same 10-K in two parallel formats: the human-readable HTML/PDF that journalists quote, and the machine-readable XBRL package that powers financial-data infrastructure across the industry. Anyone aggregating filings, building dashboards, or comparing companies at scale ends up working with the second one.

This article is a technical walkthrough of the SEC's XBRL filing format from an engineering perspective. It explains what XBRL is, why the SEC requires it, how the file structure works, and what observations come out of parsing thousands of these documents. It is not investment advice. It is not a tutorial on what numbers in a 10-K mean for portfolio construction. It is an explanation of the data layer underneath the financial statements you see in news articles.

Disclaimer: This article is for informational and educational purposes only and is not financial advice, investment advice, tax advice, or legal advice. Nothing here constitutes a recommendation to buy, sell, or hold any security. Consult a licensed financial advisor, CPA, or securities attorney before making investment or compliance decisions. The author is a software engineer building a public-data aggregator, not a registered investment adviser, broker-dealer, CFA, or CFP.

What XBRL Actually Is

XBRL stands for eXtensible Business Reporting Language. It is an XML-based open standard maintained by XBRL International, a non-profit consortium, and adopted by securities regulators around the world. The U.S. Securities and Exchange Commission requires registrants to file financial statements in XBRL alongside the traditional HTML 10-K, 10-Q, and 8-K forms. The SEC describes the program at sec.gov/structureddata and publishes the technical specifications it uses for validation.

What XBRL does, mechanically, is replace human-readable line items like "Total revenue: 394,328" with tagged data points: a number, a concept identifier from a standardized taxonomy, a reporting period, a unit of measure, and a reference to the legal entity. Every figure in a financial statement becomes a discrete fact with metadata attached. A computer can read it without parsing prose or table cells.

The SEC began phasing in XBRL requirements in 2009 for the largest filers and extended the rule across all U.S.-listed companies over the following decade. In 2018 the SEC adopted Inline XBRL (iXBRL), which embeds the tags directly inside the human-readable HTML rather than shipping a separate XBRL document. From an engineering perspective, this matters because a single iXBRL file is now both the official human-readable document and the structured-data source β€” the divergence between the two views, which used to be a real problem, mostly disappeared.

Why the SEC Requires It

The official rationale, set out in SEC press releases at the time of adoption, is that structured data lowers the cost of analyzing public-company financials and improves comparability. In practice, three concrete consequences fall out of that.

First, regulators themselves use XBRL data internally. SEC staff run automated checks against filings β€” outlier detection, year-over-year consistency, subsidiary roll-ups β€” that would be impractical against free-form HTML.

Second, the data becomes a public good. EDGAR exposes XBRL filings through several APIs documented at sec.gov/edgar/sec-api-documentation, including the company facts API at data.sec.gov/api/xbrl/companyfacts/CIK{cik}.json, which returns every reported XBRL fact for a single company across every filing. This is the API I rely on most when building company-level pages on FinanceTrackDaily. The endpoint is free, requires no key, and is rate-limited to roughly 10 requests per second per IP, with a mandatory descriptive User-Agent header (the SEC's fair-access policy at sec.gov/os/accessing-edgar-data is unambiguous about this).

Third, the financial-data vendor industry β€” Bloomberg, FactSet, S&P Capital IQ, and the smaller open-source projects β€” all rely on XBRL as a primary or secondary input. The standardization is what makes screening across thousands of companies possible at all.

How an XBRL Filing Is Structured

A 10-K XBRL submission is not a single file. It is a package, typically delivered as a ZIP archive when downloaded from EDGAR's archive endpoint. Inside that package you generally find:

  • The instance document β€” the XML file containing the actual tagged facts. Each fact is an element like 394328000000.
  • The schema (.xsd) β€” declares the taxonomies the filing uses and any custom extensions the company added.
  • Linkbases β€” XML files describing relationships between tagged elements: which calculations sum to which totals (calculation linkbase), which line items appear in which order on which statement (presentation linkbase), and the human-readable labels (label linkbase).
  • The Inline XBRL HTML β€” when filed as iXBRL, the rendered HTML 10-K with embedded and tags around every reported value.

The taxonomies themselves are external. Most U.S. filers tag against the US-GAAP Financial Reporting Taxonomy, published annually by the Financial Accounting Standards Board (FASB) and approved by the SEC. The 2024 US-GAAP taxonomy contains tens of thousands of concepts, covering everything from Revenues and NetIncomeLoss to highly specific items like OperatingLeaseRightOfUseAsset or DerivativeNotionalAmount. FASB lists the taxonomy at fasb.org/taxonomy. The SEC also maintains the Document and Entity Information (DEI) taxonomy for filing metadata and the Standard Industrial Classification (SIC) lookups for industry codes.

When a company has a financial concept that does not map cleanly onto a standard US-GAAP element, it is allowed to define a custom extension element in its own filing-specific taxonomy. From a parsing perspective, extensions are where most of the hard work lives. A naΓ―ve aggregator that only reads standard tags will silently miss a meaningful percentage of facts on filings from companies with unusual business models β€” REITs, insurers, financial institutions, and large diversified conglomerates extend heavily.

What Parsing Filings Looks Like in Practice

Aggregating SEC EDGAR taught me that the gap between "XBRL is a structured format" and "you can trust any single tag" is wider than the marketing material suggests. A few patterns from spending time inside the data.

Periods are not always what you expect. Every fact carries a contextRef pointing to a context block that defines the reporting period. For a 10-K, you typically expect one fiscal year and four quarters. In practice, filings frequently include facts for prior comparison years, restated periods, segment-level periods, and entity-specific period definitions. Picking "the" current-year revenue figure means filtering carefully by period type, segment dimension, and the document period end date declared in DEI. Skip any of those filters and you end up averaging unrelated numbers.

The same concept can appear at multiple granularities. us-gaap:Revenues may show up at the consolidated level, broken out by reportable segment, broken out by geography, and broken out by product. Each of those is a legitimate fact tagged with the same concept and different dimensional context. If you sum naively, you double- or triple-count.

Calculation linkbases are guidance, not enforcement. The calculation linkbase encodes which children sum to which parents β€” Cost of Goods Sold and Selling Expenses summing to Total Operating Expenses, for example. The SEC validates these but does not always reject filings with inconsistencies. Real filings contain rounding artifacts, missing intermediate totals, and occasional outright sum mismatches that you have to handle gracefully rather than treating as fatal.

Decimals and scale require care. XBRL facts carry a decimals attribute β€” -6 means rounded to the nearest million, -3 means thousands. The number stored is the actual value, but the precision claim is what matters when comparing across companies that report at different scales.

Inline XBRL parsing is its own discipline. Reading iXBRL means walking HTML and pulling out ix: namespace elements rather than parsing pure XML. The Arelle open-source XBRL processor at arelle.org is the reference implementation most engineers reach for; the Python python-xbrl libraries vary in quality and maintenance.

None of this is dramatic. It is the normal friction of working with a real-world standardized format that has accumulated edge cases over more than a decade. But it does mean that "I downloaded the XBRL filing and stored the numbers" understates what aggregation actually involves.

Three Concrete Engineering Observations from the Aggregation

A few specific data points from building over the EDGAR APIs.

  1. The SEC's companyfacts API returns a single JSON document per company that can grow to several megabytes for large, long-listed filers β€” a multi-decade history of every tagged concept compressed into one response. For ingestion, requesting once per company per refresh cycle is far cheaper than walking individual filings, which is why most aggregators including mine are organized around that endpoint. The trade-off is that updates lag the real-time submissions feed by hours.
  1. Across the roughly 3,400 U.S.-listed common stocks I track, the median 10-K iXBRL filing is several megabytes; the largest cross 50 megabytes for the most complex filers. Storing raw filings versus storing only the parsed-out facts is an order-of-magnitude decision for any aggregator's storage budget.
  1. The SEC's daily filing index, available as https://www.sec.gov/Archives/edgar/full-index/{year}/QTR{q}/, lists every filing accepted that day. It is the canonical source for "what changed" β€” much more reliable than scraping landing pages. The same fair-access User-Agent rule applies.

What XBRL Is Not For

A careful note, because the engineering excitement around structured financial data sometimes runs ahead of what the data can support.

XBRL gives you tagged numbers, not interpretation. It tells you that a company reported $394 billion of revenue under the us-gaap:Revenues concept for the fiscal period ending on a certain date. It does not tell you whether that revenue is high quality, sustainable, comparable to a competitor's differently-segmented revenue, or relevant to any particular investment thesis. Reading 10-K filings well β€” including the management discussion, risk factors, and footnotes β€” is a skill that the structured-data layer assists with but does not replace. The SEC's investor education site at investor.gov is consistent on this point: filings are inputs to research, not endorsements.

XBRL also does not relieve filers or analysts of accounting judgment. Two companies in the same industry can apply US-GAAP to similar transactions and arrive at different numbers, and the tags will faithfully record both. This is not a defect of the format; it is a property of accrual accounting.

For investors and consumers reading this from a personal-finance angle: the existence of XBRL is one of the reasons the SEC's investor-protection mission, set out at sec.gov/about, is enforceable at scale. Public companies cannot quietly bury inconsistencies the way they could when filings were paper. The format is a public good worth knowing about even if you never write a line of code against it.

Where to Read More

Authoritative starting points, in roughly the order I find useful:

  • SEC's structured data overview: sec.gov/structureddata
  • SEC EDGAR API documentation: sec.gov/edgar/sec-api-documentation
  • SEC's fair-access and User-Agent policy: sec.gov/os/accessing-edgar-data
  • FASB US-GAAP Financial Reporting Taxonomy: fasb.org/taxonomy
  • XBRL International technical specifications: xbrl.org/specifications
  • SEC investor education on reading filings: investor.gov/introduction-investing/investing-basics/glossary/form-10-k

For engineers specifically, the Arelle processor at arelle.org and the SEC's own EDGAR Filer Manual published at sec.gov/info/edgar are the documents I keep open while writing aggregation code.

Closing Note from the Engineering Side

Building over SEC EDGAR has reinforced a non-obvious point: the reason high-quality, free, structured financial data exists at all in the United States is that a regulator decided to require it and accepted the cost of maintaining the infrastructure. The vendors who package this data into expensive products are reselling a public good, with real value-add in cleaning and analytics layered on top. Anyone who wants to understand where the underlying numbers come from can go to the source β€” which is, on balance, a good thing for market transparency.

If you take one practical thing away from this piece: when you read a financial figure quoted in a news article about a public company, that figure traces back, almost certainly, to a tag in an XBRL filing on EDGAR. You can read the filing yourself, for free, and check.

Reminder: This article is for informational and educational purposes only. It is not financial, investment, tax, or legal advice. The author is a software engineer, not a registered investment adviser, broker-dealer, CFA, or CFP. Always consult a licensed professional before making financial decisions.

Found this helpful?

Subscribe to our newsletter for more in-depth reviews and comparisons delivered to your inbox.

Related Articles