Evaluating Webpage Fact Extraction with Braintrust - Part 1
Oct 22, 2024
Evaluating Webpage Fact Extraction with Braintrust
Introduction
In AI-driven web scraping, accurately identifying and extracting facts from webpages can be a challenge. Whether it's determining if a page is a blog post, press release, or directory, the data needs to be structured correctly for downstream applications. This is where evaluations come into play, allowing us to measure the accuracy and reliability of the extracted information. In this post, I’ll walk you through how we integrate Braintrust to evaluate prompt-driven web scraping and how we use a custom JSON scorer to ensure that entities and facts are extracted correctly from webpages.
Why Evaluations Matter in Web Scraping
Scraping is more than just grabbing HTML content—it's about identifying and extracting meaningful entities and facts. Here’s why evaluations are crucial:
Ensure accuracy: As we scrape and process data from web pages, we need to verify if the correct page type and details are extracted.
Improve reliability: Constant feedback on the scraping process helps refine the LLM prompts we use, making the system more robust over time.
Automate evaluation: By integrating evaluation into the development loop, we automate checks that would otherwise be manual, saving time and improving iteration cycles.
How We Extract Facts from Web Pages
Here’s the basic process:
Crawl the website: We first gather the web content.
Strip HTML tags: Clean the content to prepare it for language model processing.
Prompt for fact extraction: We use an LLM prompt to extract relevant facts like page type, date, title, and content.
Here’s an example of a prompt we use to identify the type of page:
Custom JSON Scorer for Fact Evaluation
To ensure that we’re extracting the right information, we need a robust way to evaluate the data. That’s where a custom JSON scorer comes in. This scorer evaluates two aspects:
Schema Scoring: It checks whether the structure of the JSON object matches the expected schema (e.g., are all the required fields present).
Value Scoring: It compares the actual values for matched keys, ensuring semantic similarity using cosine similarity.
For example, if the scraped page should be a "press release" but is misclassified as a "blog post," or if the extracted title doesn’t match the actual title on the page, the scorer will reflect these discrepancies.
Here’s a quick code snippet of the value scorer in action:
By integrating the JSON scorer with Braintrust, we can automatically evaluate the accuracy of the extracted facts. This allows us to iterate on the scraping process with real-time feedback.
Conclusion
Fact extraction from webpages is a powerful tool, but it’s only as reliable as the evaluation process you use. By integrating Braintrust and building custom scoring mechanisms, you can ensure that the entities and facts we scrape are accurate and meaningful. As LLMs become more integrated into workflows, effective evaluation will be crucial to building reliable and high-quality AI-driven solutions.