ChromaDB Indexing
Reindex All CVs: Clears the existing index and reindexes all user CVs (including applicants and hired employees) to update the vector database. This process extracts text from PDFs, generates embeddings, and stores them in ChromaDB for search.
Index New CVs: Indexes only new CVs that haven't been indexed yet. This is faster than reindexing all CVs and is useful for adding new users to the search database.
About ChromaDB Indexing
- CVs are extracted from PDFs stored in GCP bucket
- Text is split into chunks of 1000 words each (with 200 word overlap)
- Each chunk is converted to a vector embedding using Sentence Transformers
- Stored in ChromaDB for semantic search
- All users are indexed: Both applicants (
current_job_id IS NULL) and hired employees (current_job_id IS NOT NULL) - Only CVs stored in GCP bucket (not external URLs) are indexed
- Skills extraction has been removed for faster indexing performance
- Reindexing clears the existing index and rebuilds the entire vector database
Note: Reindexing runs in the background, so you can continue using the application while it processes. The progress will be updated automatically. Reindexing may take several minutes depending on the number of CVs.