The Problem Most Tutorials Don’t Mention
You’ve probably heard about Gemini and wanted to try it, but then hit a wall: the official docs assume you already know which model to pick, which SDK to install, and how multimodal inputs even work. You end up with 15 browser tabs open and still no working code.
I went through the same thing when I first started integrating Gemini into a side project. What actually helped: nail three core concepts first. Then write code that actually runs — not just copy-paste examples that fail silently.
This guide covers exactly that. By the end, you’ll have a working Python script that calls Gemini, understands when to use which model, and knows how to handle text, images, and streamed responses.
Core Concepts You Need to Know First
1. The Gemini Model Family
Google offers several Gemini models, each with a different cost/capability tradeoff:
- Gemini 2.0 Flash — Fast, cheap, great for production workloads where you need speed at scale.
- Gemini 1.5 Pro — Larger context window (up to 1M tokens), better reasoning, higher cost. Use this for long documents or complex analysis.
- Gemini 1.5 Flash — A balance between Flash and Pro. Good default for most use cases.
For 90% of junior developer projects, start with gemini-2.0-flash. It’s fast and the free tier is generous enough to prototype without spending anything.
2. Multimodal Inputs
Most APIs accept only text. Gemini natively handles text, images, audio, video, and documents — all in the same request. No separate vision endpoint, no extra SDK. You just pass image bytes alongside your text prompt.
3. The Two SDK Approaches
Two Python libraries exist for Gemini:
- google-genai — The newer, recommended SDK. Cleaner API, supports streaming, async, and all current models.
- google-generativeai — The older SDK. Still works but being phased out for new features.
Use google-genai for anything new. That’s what this tutorial uses.
Hands-On Practice
Step 1: Get Your API Key
Go to Google AI Studio, sign in with a Google account, and click Get API Key. Copy the key — you’ll need it in a moment.
The free tier gives you 15 requests per minute and 1 million tokens per day on Flash models. More than enough to build and test a real project.
Step 2: Install the SDK
pip install google-genai
Or if you’re using a virtual environment (recommended):
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install google-genai
Step 3: Your First API Call
Create a file called gemini_test.py and add this:
import os
from google import genai
# Best practice: store the key in an environment variable
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
response = client.models.generate_content(
model="gemini-2.0-flash",
contents="Explain what a REST API is in 3 sentences, for someone who just started coding."
)
print(response.text)
Run it:
export GEMINI_API_KEY="your-api-key-here"
python gemini_test.py
You should get a clean, short explanation back within 1-2 seconds.
Step 4: Streaming Responses
For longer outputs, streaming gives users immediate feedback instead of waiting for the full response. This matters a lot in chat interfaces.
import os
from google import genai
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
for chunk in client.models.generate_content_stream(
model="gemini-2.0-flash",
contents="Write a Python function that parses a CSV file and returns a list of dicts."
):
print(chunk.text, end="", flush=True)
print() # newline at the end
Each chunk.text arrives as Gemini generates it. Your users see output start appearing in under 500ms instead of waiting 5+ seconds for the full response.
Step 5: Analyzing an Image
Image analysis is where Gemini genuinely stands out. Pass an image along with a question and get a detailed analysis back.
import os
import httpx
from google import genai
from google.genai import types
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
# Fetch an image from a URL
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/1/12/ThreeTimesMooresTrees.png/1280px-ThreeTimesMooresTrees.png"
image_bytes = httpx.get(image_url).content
response = client.models.generate_content(
model="gemini-2.0-flash",
contents=[
types.Part.from_bytes(data=image_bytes, mime_type="image/png"),
"What does this chart show? Summarize in 2 sentences."
]
)
print(response.text)
You can also load images from local disk:
with open("screenshot.png", "rb") as f:
image_bytes = f.read()
# Then use Part.from_bytes as above
Step 6: Multi-Turn Chat
Building a chatbot means managing conversation history. The SDK handles this automatically:
import os
from google import genai
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
chat = client.chats.create(model="gemini-2.0-flash")
# First turn
response = chat.send_message("I'm building a FastAPI app. What's the minimal setup I need?")
print("Gemini:", response.text)
# Second turn — Gemini remembers the context
response = chat.send_message("How do I add JWT authentication to that?")
print("Gemini:", response.text)
The chat object automatically maintains conversation history. You don’t need to manually track and send previous messages.
Step 7: Controlling the Output
Generation parameters let you tune exactly how Gemini responds — three of them matter most:
from google.genai import types
config = types.GenerateContentConfig(
temperature=0.2, # Lower = more deterministic (good for code)
max_output_tokens=512, # Cap the response length
system_instruction="You are a senior Python developer. Be concise and direct."
)
response = client.models.generate_content(
model="gemini-2.0-flash",
contents="What's the difference between a list and a tuple in Python?",
config=config
)
print(response.text)
Key parameters to know:
- temperature: 0.0–2.0. Use 0.0–0.3 for factual/code tasks, 0.7–1.0 for creative writing.
- max_output_tokens: Prevents runaway responses that cost more than expected.
- system_instruction: Sets the persona and behavior for every message in the conversation.
Step 8: Handling Errors Properly
API calls fail. Rate limits happen. Always wrap calls in error handling:
import os
import time
from google import genai
from google.api_core import exceptions
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
def call_gemini_with_retry(prompt: str, retries: int = 3) -> str:
for attempt in range(retries):
try:
response = client.models.generate_content(
model="gemini-2.0-flash",
contents=prompt
)
return response.text
except exceptions.ResourceExhausted:
wait = 2 ** attempt # exponential backoff
print(f"Rate limited. Waiting {wait}s before retry...")
time.sleep(wait)
except exceptions.InvalidArgument as e:
print(f"Bad request: {e}")
break
return ""
result = call_gemini_with_retry("Explain async/await in Python.")
print(result)
What to Build Next
Working foundation in place. Here are four directions worth building toward:
- Document Q&A: Upload a PDF using the Files API and ask questions about it. Gemini 1.5 Pro handles up to 1M tokens, so even large documents work.
- Image content moderation: Pass user-uploaded images to Gemini and ask it to flag inappropriate content before storing them.
- Code review assistant: Pipe git diffs into Gemini with a system prompt that acts as a strict senior reviewer.
- Structured output: Use
response_mime_type="application/json"in the config to force JSON output — useful when you need to parse the response in your app.
The Pattern That Actually Works in Production
Treat the AI call as one step in a pipeline — not the whole product. Keep your prompts version-controlled. Log every request and response so you can debug when something goes sideways. Always have a fallback for when the API is down.
Start with gemini-2.0-flash, measure latency and quality, then upgrade to Pro only if Flash genuinely can’t handle your use case. Most of the time, it can.
The Google AI Studio playground is worth bookmarking too — prototype prompts there interactively before wiring them into code. It cuts a lot of trial-and-error cycles.

