AI Governance and Test-driven LLM Productionalization @ Iteratec


Date
Jun 17, 2024 6:00 PM

Dear Data Enthusiasts,

We will continue talking about LLMs and AI Governance in the next Meetup. Join us @ Iteratec on 17.06. to learn more about the specific challenges and approaches when trying to fight model bias, data drift and LLM hallucination. In addition, we will talk about lessons learned when testing LLM based conversational AI systems.

ML Governance, Trustworthy AI and Implications for LLMs Thomas Jirku, IBM - Sr. Technical Sales Data Science & AI We all have seen what happens when LLMs hallucinate, algorithms wrongly predict the obvious - what are the challenges and what can we do in practice to increase trust in AI systems to mitigate associated risks putting AI in production, and set the right level of expectations on both sides, providers and consumers.

Thomas is a certified Data Scientist & Technical Sales / Solution Architect in the Technical Sales team of Data Science & AI at IBM. He has been advising customers in Austria on the secure implementation of AI projects since 2015. Importantly, IBMs approach for AI governance and trustworthiness goes beyond the technical tooling but challenges Data teams to think about the whole end-to-end integration and maintenance of ML models.

No baseline? No benchmarks? No biggie! Lessons learned testing LLM-based conversational AI systems Katherine Munro, Swisscom - Data Scientist, Computational Linguist (Remote) What happens when you take a working chatbot that’s already serving thousands of customers a day in four different languages, and try to deliver an even better experience using Large Language Models? Good question. It’s well known that evaluating and comparing LLMs is tricky. Benchmark datasets can be hard to come by for your specific task, and metrics such as BLEU and ROUGE are imperfect. But that’s all rather academic; How are industry data teams tackling these issues when incorporating LLMs into production projects? In my work as a Conversational AI Engineer at Switzerland’s largest telecommunications provider, I’m doing exactly that. So join me and learn:

  • The challenges of evaluating an evolving PoC against a working product: how to compare their fundamentally different ways of working, and to identify metrics that fairly assess both.
  • How we’re using different types of testing at different stages of the PoC-to-production process: including inter-team code swaps, scaled up simulations and experiments with real customers.
  • Practical pros and cons of different test types, and when to be precious versus pragmatic about data science testing fundamentals
  • Wins, losses and WTF moments: Open questions, lessons learned, and our evolving set of testing best practices.

Whether you’re a data leader, product manager, or deep in the trenches building LLM-powered solutions yourself, come along and take away some ideas, warnings, and inspiration from our test-driven approach to building production-ready conversational AI.

Katherine is a Data Scientist and Computational Linguist, conducting R&D and strategy consulting in AI, Natural Language Processing (NLP) and data science. She is a speaker, writer, teacher, and passionate workaholic. Entering the tech world with R&D roles at Mercedes-Benz and the Fraunhofer Institute, specialising in user interfaces and Natural Language Understanding, Katherine transitioned to data science in the e-commerce and insurance domains, before landing her current role building conversational AI systems. In her free time, Katherine is a tech blogger, a LinkedIn Learning trainer, and a volunteer for diverse initiatives helping women and girls enter tech careers.

🎤🎤 Open Mic We are going to open up the stage after the talks for community announcements. If you’d like to announce something, open this slide deck, make sure you are signed in with a google account, and click “View Only” -> “Request Edit Access”. Explain in the text box what you want to announce, and we’ll give you edit access to the slide deck. 🎤🎤 Also, please note that during the event, photos might be made and later posted on VDSG’s or Magenta’s social media page. Please notify us if you do not agree. Attention attendees with food allergies. Please be aware that the food and drinks provided may contain or come into contact with common allergens, such as dairy, eggs, wheat, soybeans, tree nuts, peanuts, fish, shellfish, or wheat.

VDSG Team
VDSG Team
community building

We are an association promoting knowledge about data science as a nonprofit. We connect data scientists in Europe and all around the world. Our members are passionate data scientists from various areas of research and industry.