SEC EDGAR Full-Text Search API: An Engineer's Guide for 2026
Disclaimer: This article is for informational and educational purposes only and is not financial, legal, or investment advice. The author is a software engineer who builds financial data aggregators, not a registered investment advisor, broker-dealer, CFA, or CFP. Consult a licensed financial advisor before making investment decisions. SEC filings can contain forward-looking statements that may not reflect current company conditions.
Why I Started Querying EDGAR Programmatically
Building FinanceTrackDaily on top of the SEC's EDGAR system meant aggregating filings from more than 3,400 US-listed stocks, and the part that surprised me first was not the volume but the structure. Most retail investors I have shown the platform to think EDGAR is just a search box on sec.gov. From an engineering perspective, EDGAR is closer to a layered API surface: a metadata index, a full-text search endpoint, a per-CIK submissions feed, and a static archive of filing documents. Knowing which surface to query for which question has saved me hours of wasted aggregation work.
This guide is the article I wish I had read when I first started writing the EDGAR ingestion job. It walks through what the SEC EDGAR Full-Text Search API actually returns, how it differs from the per-company submissions endpoint, the rate limits that will trip you up, and the educational use cases retail investors can run themselves without paying for a Bloomberg terminal. It is not a list of stocks to buy. It is a builder's view of a public data source.
What the Full-Text Search API Actually Does
The SEC operates a search service at https://efts.sec.gov/LATEST/search-index?q=..., often called the EDGAR Full-Text Search (EFTS) API. The official user-facing form lives at efts.sec.gov, but the underlying JSON endpoint is what powers the front-end. According to the SEC's own EDGAR API documentation page, the service indexes the full text of filings going back to 2001, with a separate submissions API providing structured metadata for each filer.
What surprised me when I first hit the endpoint: it returns up to 100 hits per request, paginated by a from parameter. The response is a JSON envelope with a hits.hits array, each item carrying the accession number, file date, form type, and a snippet of matched text. There is no rate-limit token in the response header that I could find, but the SEC's fair access policy caps automated traffic at 10 requests per second per IP, and they require a descriptive User-Agent string with contact information. I learned that one the hard way after a blocked IP at 4am.
Three Endpoints That Cover Most Aggregation Needs
- Full-Text Search (
efts.sec.gov/LATEST/search-index) β keyword and phrase queries across filing bodies. Useful when you want to find every 10-K that mentions a specific risk factor phrase. - Submissions (
data.sec.gov/submissions/CIK{10-digit}.json) β structured metadata for one filer: filings list, form type, accession number, primary document. This is the workhorse for building per-company timelines. - Company Facts (
data.sec.gov/api/xbrl/companyfacts/CIK{10-digit}.json) β XBRL-tagged financials, the same data points public companies file in machine-readable form. The companion SEC XBRL engineer guide walks through the tag taxonomy in more detail.
For the FinanceTrackDaily ingestion job, I use submissions for the per-stock feed and full-text search for cross-cutting questions. I rarely use the XBRL company-facts endpoint at ingest time because the volumes are large and the data only changes when a new periodic report is filed.
Engineering Observations from Aggregating 3,400 Tickers
Here are the patterns I noticed after running the ingestion daily for several months. None of these are stock picks. They are observations about how the data behaves.
1. Filing volume is highly seasonal. The 10-Q and 10-K filing windows cluster around fiscal-quarter-end plus 40 to 90 days, depending on filer size. The SEC categorizes filers as large accelerated, accelerated, and non-accelerated, and each tier has different deadlines spelled out in SEC Form 10-K instructions. Large accelerated filers must submit a 10-K within 60 days of fiscal year-end; non-accelerated filers get 90. If your aggregator only ingests once a day, you will see daily filing counts swing from under 50 to well over 800 during peak weeks.
2. Form 8-K is the noisy signal. 8-K is the current report form, used to announce material events between periodic reports. According to the SEC's Form 8-K instructions, there are at least 30 distinct item numbers covering events like leadership changes (Item 5.02), acquisitions (Item 2.01), and bankruptcy (Item 1.03). When I categorize the 8-Ks I aggregate, roughly half are routine β earnings announcements (Item 2.02) and Reg FD disclosures (Item 7.01). The other half is where the surprises live.
3. The submissions endpoint truncates at 1,000 filings. If a company has filed more than 1,000 documents (very common for older filers like General Electric or IBM), the JSON response chunks the rest into separate files referenced under files.recent and files. New developers hit this quickly when querying old issuers and assume the API is broken. It is not β you have to follow the pagination index.
Writing a Polite Query: A Minimal Engineer Example
The official SEC documentation uses curl for examples. Here is the same call with the headers the SEC actually wants. Replace the contact info with your own real address before running it. Per SEC fair access policy, requests without a contact User-Agent can be silently throttled.
curl -s -H "User-Agent: YourName [email protected]" \
-H "Accept-Encoding: gzip, deflate" \
"https://efts.sec.gov/LATEST/search-index?q=%22climate-related+risks%22&forms=10-K&dateRange=custom&startdt=2025-01-01&enddt=2025-12-31"
The response is JSON. The hits.total.value field tells you how many filings matched, and the hits.hits array contains snippets. For high-volume queries, paginate with from in increments of 10 (the default page size) up to a maximum of 10,000 results per query. If you need more than 10,000 hits, narrow the query with date or form-type filters.
Educational Use Cases That Don't Require a Bloomberg Terminal
The SEC's Investor.gov portal explicitly encourages retail investors to read primary filings. The Full-Text Search API makes a few lightweight research workflows possible without paid data services:
- Track risk-factor language drift. Compare a company's 10-K Item 1A risk factors year over year. Many companies copy and paste, so the diff highlights actually new risks management chose to disclose. This is a pattern the FINRA Investor Education pages also discuss as a research method.
- Search insider transaction language across filings. Form 4 filings (covered in our Form 4 insider trading guide) have machine-readable XML, but the prose narratives in 10-Q MD&A sections sometimes contextualize what the Form 4 numbers mean.
- Find every 8-K Item 5.02 (executive departure) in a sector over a date window. This is one query against the EFTS endpoint with a form filter and a phrase. The result is not investment advice β it is a starting point for further reading.
- Cross-reference 13F holdings with proxy statements. The Form 13F guide covers institutional-holdings reporting; combined with DEF 14A proxy disclosures, you can see when major holders are also voting against management.
None of these workflows tell you whether to buy or sell. They give you primary-source context for whatever investment decision you and your licensed advisor reach.
Common Pitfalls I Hit (So You Don't)
The first time I ran a backfill, I sent the EFTS endpoint about 400 queries per minute from a single IP. By query 380, every response was a 403. Lesson one: the 10 req/sec ceiling is not a guideline. Lesson two: even within the limit, exponential backoff on a 429 or 5xx is non-optional. The SEC's infrastructure is shared with every retail investor, every academic researcher, and every other aggregator. They are explicit about this in the fair access policy.
The second pitfall is encoding. EDGAR filings going back to 2001 contain a mix of HTML, plain text, and SGML envelopes. Older 10-Ks are routinely 12 MB single-file HTML documents with inline images. If you naively pull the primary document and store it in a database column, you will fill a hosting plan quickly. I learned to store filings on object storage and only persist metadata plus a parsed-text excerpt in MySQL.
The third pitfall is CIK formatting. The Central Index Key (CIK) is the unique identifier the SEC assigns each filer. The submissions endpoint requires CIK with leading zeros to ten digits β Apple Inc. is CIK0000320193, not CIK320193. Format mismatches give you a 404 that looks like the company does not exist. It does. You typed the URL wrong.
How This Compares to Paid Data Vendors
A common question I get: why not just pay for a vendor feed? For an aggregator at the scale of FinanceTrackDaily, the answer is that the SEC source is the same data the vendors are reselling, with a small parsing tax. The trade-off is engineering time. If you are an individual investor reading filings for your own portfolio, the SEC's free EDGAR is more than enough. If you are running quantitative research at institutional scale, vendor-normalized feeds save weeks of XBRL-tagging work β but they cost real money, and the underlying source is still the SEC.
The Federal Reserve Economic Data (FRED) system is a similar story for macroeconomic data: free, well-documented, and the source vendors resell. Engineers building public-finance tools should learn both before reaching for paid APIs.
Practical Tips for Retail Investors Reading Filings
Even without writing code, the EDGAR full-text search front-end at efts.sec.gov is a usable research tool. A few things I have shown non-engineer friends:
- Search a phrase you expect a company to disclose, like the name of a regulator or a specific product line, and filter by form type 10-K. The matched snippets often surface disclosures that headline news skipped.
- Use the date range filter to narrow to filings made in the last 30 days β useful for surfacing recently filed 8-Ks during earnings season.
- When a filing references an exhibit (Exhibit 10.x for material contracts is a common one), click through. The exhibits often contain the actual contract terms that the body of the 10-K only summarizes.
None of this replaces working with a licensed financial advisor. It does mean you can verify claims a brokerage analyst or social-media stock tipster makes about a company by reading the source filing yourself.
FAQ
Is the SEC EDGAR Full-Text Search API free?
Yes. The SEC offers EDGAR data at no charge as part of its public-disclosure mandate. Fair-access rules apply: 10 requests per second per IP, descriptive User-Agent required.
Do I need an API key?
No. The endpoint is open. The User-Agent header is the only authentication-like requirement and it exists for traffic accountability, not access control.
How far back does the full-text index go?
The SEC's documentation states the full-text index covers filings from 2001 onward. Older filings exist in the EDGAR archive but are not indexed for keyword search.
What's the difference between EDGAR Online and EDGAR proper?
EDGAR Online is a former third-party rebranding; the authoritative system is the SEC's own EDGAR at sec.gov/edgar. Always cite the SEC source for primary filings.
Can I run sentiment analysis on EDGAR filings?
Yes, and academic researchers do. The Loughran-McDonald financial sentiment dictionary is a commonly cited starting point. From an engineering perspective, the harder problem is normalizing filings across HTML, plain-text, and inline-XBRL formats.
Does FinanceTrackDaily resell EDGAR data?
No. FinanceTrackDaily is an aggregator that surfaces filing metadata and links back to the original SEC source. The platform is an engineering project, not a registered investment advisor or broker-dealer.
Final Word from the Engineer's Bench
Public-disclosure infrastructure like EDGAR is one of the genuinely good things the US securities regulatory system produces. From an engineering perspective, it is also a well-designed, free, mostly-stable data source that any individual investor can query without intermediaries. The Full-Text Search API is the layer most people overlook, partly because the documentation is sparse and partly because the interesting questions require some patience with rate limits and pagination.
Reading filings yourself does not turn you into an investment advisor. It does mean that when a financial advisor, brokerage report, or social-media post tells you something about a company, you can go to the primary source and check. That habit, in my own opinion as an engineer who has read more 10-Ks than I ever expected to, is one of the most defensible educational practices a retail investor can build.
Reminder: This article is for informational and educational purposes only and does not constitute financial, legal, tax, or investment advice. The author is a software engineer, not a licensed financial advisor or broker-dealer. Past disclosures in SEC filings do not predict future performance. Always consult a licensed advisor before making investment decisions.
Found this helpful?
Subscribe to our newsletter for more in-depth reviews and comparisons delivered to your inbox.