Usage ====== This chapter introduces how to use **metadata‑crawler** to collect metadata from files stored in various backends and index the results into an index system. You can drive the crawler either from the Python API or via the provided command‐line interface. The library supports both synchronous and asynchronous workflows. The general workflow of collecting metadata is separated into *two* steps: 1. Harvesting metadata and storing the crawled data to a **metadata store** This step should de-couples the crawling from the indexing procedure. Supported metadata stores are **intake catalogues**, **MongoDB** and **PostgreSQL**. 2. Indexing the metadata to the index backend. The harvesting supports versioned datasets. Dataset versions are stored in two different collection. One that defines *all* dataset versions and one that only stores data from the *latest* dataset versions. This discrimination allows users to quickly access relevant datasets without having to take dataset versions into account (*latest* versions only). .. versionadded:: 2605.0.0 **MongoDB** and **PostgreSQL** backends support mark-and-sweep clean-up of stale records. Every record written during a crawl is stamped with a timestamp. When the crawl finishes, records older than a configurable grace period are automatically removed. This prevents stale entries, files that have been moved, renamed, or deleted, from accumulating across successive crawls. .. toctree:: :maxdepth: 1 sec1-cli sec2-python