This page documents the automated crawlers operated by YOSA (yosa.ai). If you've found one of these user agents in your server logs, this page explains what it is, why it's visiting your site, and how to control its access.
List of official bots operated by YOSA
- YOSA Crawler
- YOSA Favicon Fetcher
YOSA-Crawler
User-Agent: YOSA-Crawler/2.0.0 (+https://yosa.ai)
What it does
YOSA-Crawler indexes the content of websites added as projects in YOSA. The indexed content is used to build a project's Knowledge Base - a contextual layer that improves the relevance and accuracy of AI-generated content for that site.
The crawler visits publicly accessible pages and extracts text content. It does not submit forms, execute JavaScript interactions, log in to restricted areas, or store personal data beyond what is necessary to build the Knowledge Base.
When it visits
YOSA-Crawler only crawls websites that have been explicitly added as a project by a YOSA user. It does not crawl sites speculatively or without a user initiating it.
Crawls are triggered in three situations:
- When a new project is created
- Once every month after first crawl to update previously created Knowledge Base with any content updated or added to the website in the meantime
- When a user manually requests a Knowledge Base refresh
YOSA-FaviconFetcher
User-Agent: YOSA-FaviconFetcher/1.0.0 (+https://yosa.ai)
What it does
YOSA-FaviconFetcher checks the location of and retrieves the favicon of a website when it is added as a project in YOSA. The favicon is used for display purposes within the YOSA interface to help users identify their projects visually.
It fetches your domain's home page each time you update the relevant project to find the favicon URL specified in the page's <head> element. It attempts to fetch that favicon and confirm it's existence, then stores it's address in our database.
It does not index content or visit any other pages.
Abuse and misuse
If you believe a YOSA crawler is behaving incorrectly - ignoring robots.txt, crawling too aggressively, or accessing pages it shouldn't - please contact us at [email protected] with your domain and the relevant server logs.
We take crawler conduct seriously and will investigate promptly.