1. Setup
1.1 Run Qdrant
A simple way to start Qdrant locally is via Docker:
docker pull qdrant/qdrant
docker run -p 6333:6333 -p 6334:6334 \
-v "$(pwd)/qdrant_storage:/qdrant/storage:z" \
qdrant/qdrantThis exposes the REST API on localhost:6333 and gRPC on localhost:6334, persisting data under ./qdrant_storage. Qdrant - Vector Database
Alternatively, you can spin up an in‑memory instance (for quick tests) directly from Python:
python
CopyEdit
from qdrant_client import QdrantClient
client = QdrantClient(path=":memory:") # no Docker needed
1.2 Install the Python Client
Install Qdrant’s official Python client via pip:
bash
CopyEdit
pip install qdrant-client
This provides CRUD and search methods for your collections. python-client.qdrant.tech
1.3 Install Jina Embeddings SDK
Use JinaAI’s Python SDK to generate text embeddings:
bash
CopyEdit
pip install jinaai
This gives you access to the jina-embeddings-v2-base-en (and other) models via a simple client API. PyPI
You can also install the full Jina framework if you plan to build pipelines beyond embeddings:
bash
CopyEdit
pip install jina
2. Ingesting Your Data
Assume you have a pandas DataFrame df with columns image_url, image_id, and caption.
2.1 Initialize Clients
python
CopyEdit
import pandas as pd
from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct, Distance, VectorParams
from jinaai import EmbeddingClient
# Qdrant: connect to your local/memory instance
qdrant = QdrantClient(url="http://localhost:6333")
# Jina: choose an embedding model
emb_client = EmbeddingClient(model="jina-embeddings-v2-base-en")
python-client.qdrant.techHugging Face
2.2 Create or Recreate a Collection
Define a collection named "images" with the appropriate vector size (e.g., 1024 for jie-embeddings-v2-base-en) and cosine distance:
python
CopyEdit
vector_dim = 1024
qdrant.recreate_collection(
collection_name="images",
vectors=VectorParams(size=vector_dim, distance=Distance.COSINE)
)
Qdrant - Vector Databasepython-client.qdrant.tech
2.3 Encode Captions and Prepare Points
Convert captions into vectors, then bundle each with its URL and ID as payload:
python
CopyEdit
# Load your DataFrame
# df = pd.read_csv("your_images.csv")
# Encode all captions at once
captions = df["caption"].tolist()
vectors = emb_client.encode(captions) # returns List[List[float]]
# Build PointStruct list
points = [
PointStruct(
id=int(row.image_id),
vector=vectors[i],
payload={"image_url": row.image_url, "image_id": row.image_id}
)
for i, row in df.iterrows()
]
2.4 Upsert into Qdrant
Send your points in batches to Qdrant:
python
CopyEdit
qdrant.upsert(collection_name="images", points=points)
This will index your embeddings and store the associated payloads for filtering or retrieval. python-client.qdrant.tech
3. Performing a Search
To find the top 10 images most similar to a new text query:
3.1 Embed the Query
python
CopyEdit
query_text = "sunset over mountains"
query_vector = emb_client.encode([query_text])[0]
3.2 Run the Nearest-Neighbor Search
python
CopyEdit
results = qdrant.search(
collection_name="images",
query_vector=query_vector,
limit=10
)
Each result includes the id, score, and your payload (with image_url and image_id). Qdrant - Vector Databasepython-client.qdrant.tech
Further Resources
- Full Qdrant quickstart & tutorials: Qdrant - Vector DatabaseQdrant - Vector Database
- Qdrant GitHub examples: GitHub
- Airbyte’s beginner guide: airbyte.com
- AllDevStack REST tutorial: alldevstack.com