Everything a serious
scraping team needs.
Lizard isn't just a scraping library. It's a full extraction platform with compliance, intelligence, and observability baked in from day one.
Your compliance team will thank you.
Lizard generates a full audit trail for every HTTP request, data access event, and schema mutation. Built-in data lineage tracking means you can trace any extracted record back to its source — exactly what SOC2 Type II auditors need.
Automatic audit logging
Every request, response, and data transformation is logged with timestamps, user context, and session IDs.
Data lineage tracking
Trace any data point back to its origin URL, timestamp, and extraction rule.
Access controls
Role-based access to spiders, outputs, and compliance reports.
Retention policies
Configure data retention per-spider with automatic expiry and deletion logs.
Automated reports
Generate SOC2-ready compliance reports on demand for auditors.
"token-keyword">import lizard
"token-keyword">class AuditedSpider(lizard.Spider):
name = "products"
"token-keyword">class="token-comment"># SOC2 profile: logs everything
compliance = lizard.SOC2Profile(
log_level="full",
data_lineage="token-keyword">True,
retention_days=90,
access_control=["team:engineering"],
)
"token-keyword">async "token-keyword">def parse(self, response):
"token-keyword">class="token-comment"># Every "token-keyword">yield is audit-logged
"token-keyword">yield {
"product_id": response.css("h1::text").get(),
"price": response.css(".price::text").get(),
}The only framework that takes compliance seriously.
| Capability | Lizard | Scrapy | Crawlee | Playwright |
|---|---|---|---|---|
| SOC2 compliance built-in | ||||
| AI-powered field remapping | ||||
| Entity persistence across time | ||||
| Distributed tracing (zero config) | ||||
| Real-time metrics dashboard | ||||
| GDPR/CCPA handling | ||||
| Hosted scheduler | ||||
| Team access controls | ||||
| Audit log export |