Show HN: Robust LLM Extractor for Websites in TypeScript
We've been building data pipelines that scrape websites and extract structured data for a while now. If you've done this, you know the drill: you write CSS selectors, the site changes its layout, everything breaks at 2am, and you spend your morning rewriting parsers.<p>LLMs seemed like the obvious fix — just throw the HTML at GPT and ask for JSON. Except in practice, it's more painful than that:<p>- Raw HTML is full of nav bars, footers, and tracking junk that eats your token budget. A typical product page is 80% noise. - LLMs return malformed JSON more often than you'd expect, especially with nested arrays and complex schemas. One bad bracket and your pipeline crashes. - Relative URLs, markdown-escaped links, tracking parameters — the "small" URL issues compound fast when you're processing thousands of pages. - You end up writing the same boilerplate: HTML cleanup → markdown conversion → LLM call → JSON parsing → error recovery → schema validation. O
More in Marketing
Data built modern marketing, but AI is rewriting the rules
Marketers have long treated data as the goal. AI flips that model. Here’s how to adapt your data strategy. The post Data built modern marketing, but AI is rewriting the rules appeared first on MarTech.
What the top 5% of AI users do differently
Most employees use AI, but only a few use it to create real value. Here’s how to recognize them — and how to help everyone else catch up. The post What the top 5% of AI users do differently appeared first on MarTech.
How NotebookLM turns marketing docs into usable insights
Upload research, reports and campaign data to explore patterns, answer questions and uncover insights from your own material. The post How NotebookLM turns marketing docs into usable insights appeared first on MarTech.
TikTok Plans Bigger, Bolder Actions With Future Secured
The threat of a US ban no longer overshadows the app
More from Pivot News
Fact or fiction: Exploring the reality of AI in payments testing
Exploring the expectations and realities of AI in payments testing.
Patch and perish: The hidden risks of incremental payment modernisation
What strategic capabilities must executives invest in now to prevent today’s rising real-time volumes from becoming tomorrow’s systemic or reputational failures?