Skip to content

Week 2, Office Hour 2 (May 29)

Study Notes:

I hosted an office hours session focused on retrieval-augmented generation (RAG) implementation challenges and strategies. Participants shared their experiences with the course materials and discussed specific technical issues they're facing in their RAG projects, from processing technical documentation to handling irrelevant data in vector databases.


If you want to learn more about RAG systems, check out our RAG Playbook course. Here is a 20% discount code for readers.

RAG Playbook - 20% off for readers


How can I apply course concepts to my actual project while balancing time constraints?

Several participants expressed the challenge of finding time to apply course concepts to their real-world projects while managing full-time jobs. One participant noted, "I have a day job with a packed schedule. I already have to make room for lectures and these conversations, which leaves very little time to apply this to my project."

This is a common challenge when learning new technical skills alongside existing responsibilities. For those in this situation, I recommend focusing on completing the course first and then applying the knowledge afterward. The community will remain active even after the course ends, with the Slack channel staying open and all videos remaining available.

For those who need more immediate application, consider reaching out about a consulting engagement after the course. The reality is that deep implementation often requires dedicated time that's difficult to carve out while maintaining other responsibilities.

Key Takeaway: Learning and implementation often need to be sequenced rather than parallel when you have limited time. Focus on absorbing the knowledge first, then plan dedicated time for application afterward.

What happens to the community after the course ends?

While we won't have structured bi-weekly meetings after the course concludes, the Slack channel will remain active, and I'll check it regularly to share resources and interesting developments. All course materials, including videos and the Maven pages, will remain accessible.

The community's activity level will largely depend on participant engagement - "it's basically going to be like however much you put in is what you're going to get out." We don't have a community manager pushing conversations, as my goal isn't to maximize message volume.

Many valuable interactions happen through direct messages rather than in public channels. For example, one participant is about to launch their own company, and we're jumping on calls to discuss their ideas and make introductions.

Key Takeaway: The community will continue beyond the formal course structure, but its value will depend on your active participation and willingness to engage with others.

How should I handle irrelevant data being pushed into my vector database?

One participant working on an application with high-performance RAG requirements asked about the impact of irrelevant data in their vector database: "How much do I need to worry if there's irrelevant data being pushed into our vector database? Is it not that big of a deal because we have metadata filtering and good retrieval, or is it a big deal?"

This concern tends to be model-specific. Foundation model companies have been optimizing for recall after discovering the "needle in a haystack" problem, where models struggled to find specific information buried in large contexts. While this improved recall, it made models more sensitive to precision issues.

The real risk now is that low precision might hurt your language model's ability to reason correctly. When dealing with irrelevant data, consider whether it's "adversarially irrelevant" - is the data actually conflicting rather than just unnecessary?

For example, in construction documentation, you might have an email saying a wall is yellow, an architect's note saying it's blue, and a text message claiming it's purple. In these cases, you need to establish authority hierarchies or time-based weighting to resolve conflicts.

Key Takeaway: The impact of irrelevant data depends on whether it's merely unnecessary or actively conflicting. Modern models are optimized for high recall but can be sensitive to precision issues, so conflicting information can be particularly problematic.

What metrics should I monitor for retrieval quality in production?

When asked about vector databases providing retrieval quality measurements, I recommended focusing on metrics you can monitor yourself rather than trusting vendor-provided metrics.

Consider tracking the average cosine distance of your queries over time. If this metric suddenly changes, it could indicate a shift in your data or user behavior. For example, in a previous recommendation system I built, we monitored cosine distance between products and noticed a sudden drop. After investigating by segmenting the data by signup date, gender, and life stage, we discovered we had onboarded many young users through a Super Bowl ad campaign who couldn't afford our $300 clothing items.

You might also monitor average re-ranker scores and look for changes over time or across different user segments. These metrics are more valuable than arbitrary tests created by vector database providers.

Key Takeaway: Focus on monitoring changes in metrics like average cosine distance rather than absolute values, and segment your analysis by relevant variables to identify the root causes of any shifts.

What's the best approach for processing complex technical documentation?

A participant working on processing technical manuals for question answering described their current approach: "We're leveraging the internal structure of the document, taking sections, splitting them, but including the hierarchy of titles - section, chapter, and manual title. But it feels naive to me."

This challenge is common when dealing with structured technical content. One approach is to use traversal rather than pure semantic search - similar to how code-based agents navigate repositories. Instead of embedding everything, the system can navigate the document structure to find relevant information.

For example, when working with Brazilian tax codes (400-page PDFs), we implemented a system that traversed the documents using a combination of semantic search, full-text search, and grep-like tools. The system could navigate from main sections to specific appendices to find relevant information.

The key insight is that traversal is still a form of retrieval. As you collect traversal data, you can use it to improve your embedding models, potentially reducing the need for complex traversal in the future.

Key Takeaway: For complex technical documentation, consider combining semantic search with structural traversal. Use the document's inherent organization to guide your retrieval process, and collect this data to improve your embedding models over time.

Should I build complex hierarchical structures for document retrieval?

When discussing whether to build sophisticated graph structures for document retrieval, I emphasized the importance of getting to usable data quickly: "My metric is: whatever I build should be the thing that gets me to 10,000 rows in a CSV file."

Rather than spending extensive time modeling tax laws as a graph or building complex hierarchical indexes upfront, I recommend chunking everything, getting a working system, understanding the problems, and creating examples. This data-driven approach allows you to identify patterns that can inform more sophisticated solutions later.

The better lesson in AI development is that segmenting and solving individual problems can help you make progress now, while preparing unified datasets that will allow you to combine approaches when technology improves. This mirrors the evolution of speech-to-text systems, which initially required separate stages for waveforms, phonemes, and words before end-to-end solutions became viable.

Key Takeaway: Focus on collecting data and building working solutions rather than perfect architectures. The insights gained from real usage will guide your more sophisticated implementations later.

How should we think about the relationship between RAG and agents?

An interesting perspective emerged during our discussion: "RAG is like the superpower for AI right now." We explored how the boundaries between RAG and other AI capabilities are blurring, with one participant noting "grep is RAG" - highlighting that any method of retrieving context for an AI system shares fundamental similarities with RAG.

I've been considering whether we should rename the course to focus on "RAG applications" since modern AI systems are essentially exposing a portfolio of tools to agents. Whether you're using semantic search or a grep-like function to pull in relevant code, you're still finding information to enhance the context available to the model.

The core principle remains the same: "It has to be put into the context at the right time so that you can get the response correct." This perspective frames RAG not just as a specific technique but as a fundamental paradigm for augmenting AI capabilities with relevant information.

Key Takeaway: The distinction between RAG and other AI augmentation approaches is increasingly blurred. The fundamental goal is getting the right information into the context at the right time, regardless of the specific retrieval mechanism.

What's the value of the office hours format for learning?

Several participants expressed surprise at how valuable they found the office hours sessions. One noted, "I thought they wouldn't be useful, but I'm surprised with the quality of the questions being asked."

These interactive sessions provide an opportunity to hear how others are applying the course concepts and to discuss specific challenges that might not be covered in the structured lectures. The questions often reveal practical implementation issues that many participants are facing but might not have articulated themselves.

The conversations also help connect theoretical concepts to real-world applications, making the material more concrete and actionable. For example, our discussion about monitoring cosine distances in production systems provided a practical perspective on evaluation that complements the more structured content on evaluation frameworks.

Key Takeaway: Interactive learning formats like office hours provide valuable perspectives that complement structured course content, particularly for understanding how concepts apply to diverse real-world scenarios.

How should we pace the course to maximize learning?

When asked about the pacing of the course, I acknowledged that many participants are finding it challenging to keep up with all the material. One suggestion was to include a week in the middle with no new material to allow people to catch up, which received positive feedback.

I noted that Week 3 is intentionally lighter, with only a 40-minute video and no notebooks, designed as a catch-up week. However, I recognized that I should make this more explicit to help participants plan their time.

The six-week format provides more depth than a one-week intensive course would allow, but it requires consistent engagement to get the full benefit. Finding the right balance between comprehensive coverage and manageable pacing remains a challenge.

Key Takeaway: Learning complex technical skills requires finding the right balance between depth of content and time for absorption and practice. Building explicit catch-up periods into courses can help participants manage their learning journey more effectively.

What can we learn from leaked system prompts like Anthropic's Claude?

One participant asked about the recently leaked Anthropic Claude prompt, which was reportedly around 30,000 tokens: "Where does it leave room for actual content to be processed? Is it even realistic or just hype?"

I wasn't surprised by the size of this prompt, explaining that it makes sense for the Claude web app experience, which includes tools for fetching information. The API version likely has a smaller prompt, but it's still substantial if web search capabilities are included.

This reveals how much can be done through prompting without changing model weights. It's remarkable that models can now process 30,000 token system messages when just two years ago, the entire context was limited to 32K tokens.

The existence of such extensive system prompts raises questions about where certain capabilities should reside - in the prompt or in the model weights. For example, if a fetch tool were baked into the model weights, what would happen if you named your custom tool "web_search" and the model tried to call a hardcoded "fetch" function?

Key Takeaway: Large system prompts demonstrate how much functionality can be implemented through instructions rather than model training. This creates flexibility but also raises important questions about the boundary between prompt engineering and model architecture.


FAQs

How can I balance the course with my regular work schedule?

Many participants find balancing the course with their day job challenging. The course requires time for watching lectures, completing exercises, and participating in discussions. Consider setting aside specific time slots in your schedule for course activities and prioritize what aspects are most valuable to you. Remember that you can always revisit materials after the course ends if you're unable to complete everything during the active weeks.

Will course materials remain available after the course ends?

Yes, all course materials including videos, notebooks, and exercises will remain accessible after the course concludes. The Slack channel will also stay active, allowing you to continue asking questions and collaborating with other participants. While structured bi-weekly meetings won't continue, you'll still have access to all resources and can work through them at your own pace.

How active will the community be after the course ends?

The community's activity level will largely depend on participant engagement. While there won't be formal scheduled sessions after the course, the instructors will check the Slack channel regularly and share relevant resources. The value you get from the community will correlate with how much you contribute to it. Many valuable connections happen through direct messages rather than in public channels.

Week 3 is intentionally lighter with only an hour-long video and no notebooks, providing an opportunity to catch up on previous materials. The course team is considering adding a "break week" in future cohorts to give participants more time to process information and complete exercises. Don't worry if you can't complete everything during the course timeframe—the materials will remain available afterward.

How can I apply what I'm learning to my actual projects?

The most effective way to apply course concepts to your work is to start with the exercises to build foundational understanding, then gradually incorporate techniques into your projects. Some participants find it helpful to wait until after the course to fully implement what they've learned, as this allows them to focus on understanding the concepts first. For more personalized guidance, reaching out about consulting engagements after the course can be beneficial.

What's the best approach to RAG (Retrieval-Augmented Generation) for technical documentation?

When working with technical documentation, consider these approaches:

  1. Start by focusing on getting retrieval right before worrying about other aspects

  2. Use document structure (sections, chapters, titles) to improve chunking

  3. Consider a combination of semantic search, full-text search, and traversal approaches

  4. Monitor metrics like cosine distance to evaluate retrieval quality

  5. Begin with a simple implementation that works for most of your documents rather than trying to solve every edge case immediately

How should I handle irrelevant data in my vector database?

The impact of irrelevant data depends on your specific model and use case. Modern language models are optimized for high recall, which can make them sensitive to low precision issues. Consider whether irrelevant data is merely noise or actually conflicting/adversarial. For conflicting information, you may need to implement authority rules (like prioritizing certain document types) or time-based weighting. Rather than trying to perfect your data filtering upfront, start with a basic implementation, collect examples, and iterate based on actual performance.

Are vector databases providing built-in retrieval quality measurements?

While some vector databases may offer metrics, it's generally better to implement your own monitoring. Focus on tracking metrics like average cosine distance of your queries and monitor how these change over time or across different user segments. This approach allows you to detect shifts in data patterns or user behavior that might affect retrieval quality. Looking at changes in these metrics is often more valuable than the absolute values themselves.