At ProCogia, where Snowflake is at the core of our data strategy transformations, the Snowflake Summit 2024, was a highly anticipated event. One of our consultants, Anant Sharma, a Senior Data Architect, presented at the summit to discuss how we helped T-Mobile migrate 2.5PB of data to Snowflake and reduce costs.
Anant Sharma (center), Senior Data Architect at ProCogia, presenting at the Snowflake Summit 2024
Although I couldn’t be there in person, the online coverage provided an in-depth view of the latest advancements and breakthroughs. The summit introduced a range of new capabilities and partnerships that promise to enhance our service offerings. In this blog, I will share the most significant updates from the event and discuss how these innovations will empower us to deliver cutting-edge data solutions to our clients. Most of the sessions centered around key features like Cortex AI, new governance tools, Snowpark improvement, and better app development capabilities. Some of the features are GA (Generally Available) and others are still in preview.
AI Capabilities
Document AI will soon be GA:
Document AI helps to efficiently extract fields from multiple documents in batch using SQL and stores outputs in Snowflake tables. This gives Data Engineers more flexibility with pdf document sources.
New capabilities for Cortex AI:
Cortex is a fully managed service that aids enterprise AI with no-code development, serverless fine-tuning, and managed services for building AI applications like chat-with-data interfaces.
The new features in Cortex AI include chat experiences that should help develop chatbots quickly working with structured and unstructured data. To prove this, they pulled a random person from the attendees to come up stage and build a chatbot within minutes.
Snowflake Co-pilot is now GA:
You can ask questions about your data, generate table description, and perform several data analysis tasks. Over 20, 000 users tested it out while in preview.
Universal search is now GA:
It uses AI for natural language searches across multiple data types and sources. A one stop search portal for everything within snowflake.
From Snowsight, you can search tables, views, functions, procedures, Snowflake Marketplace listings, documentation, worksheets, dashboards, and Internal Marketplace listings all in one place.
Uniform Listing Locator:
A kind of URL in the data cloud. If you have a listing you like, you can plug in the ULL in your SQL query and start using it without requiring elevated privileges. The ULL is a unique id for a listing in the AI Data Cloud.
Data Governance
Data Classification Interface is in Snowsight UI/ Automatic Classification:
Snowflake now has Sensitive Data Classification Interface; it simplifies data classification with auto-tagging and auto-classification.
Automatic Tag Propagation:
Tags will be sticky wherever it moves to if you use Automatic Tag Propagation. You can now see downstream objects impacted by modifications. Using propagating tags in workflows can help protect downstream columns with PII.
Trust Center is now in public preview (Not in Trial accounts):
You can use the Trust Center to evaluate, scan and monitor your account for security risks. The Trust Center evaluates your account against recommendations. It provides centralized monitoring for security and compliance risks across clouds.
Cost Monitoring Interface:
This is a single, centralized interface to manage costs in Snowflake. It gives visibility into account and organizational level spend and usage metrics, while also providing cost insights. Budgets and Resource Monitors can be set directly from the same interface.
Snowflake Trail:
Snowflake Trail is a set of Snowflake capabilities for developers to better monitor, troubleshoot, debug, and take actions on pipelines, apps, user code and compute utilizations.
This is a Snowflake Observability feature without any agent installation. There will be a user interface where you can view and collect built-in metrics, logs, traces. The event table can integrate with popular observability partners.
It was built with OpenTelemetry standards. Snowflake telemetry and notification capabilities integrate with some of the most favored developer tools, including Datadog, Grafana, Metaplane, Monte Carlo, PagerDuty and Slack. Or simply use Snowsight, where developers can monitor and trace their pipelines, apps and runtime usage directly within Snowflake.
Better support for open storage architectures – Introducing Polaris:
Snowflake highlighted their support for open storage architectures by announcing the GA of Iceberg tables and introducing Polaris.
They unveiled the Polaris Catalog for the Apache Iceberg open table format, often used for implementing Data Lakes and Data Lakehouses. Polaris catalog will enhance interoperability across various data processing engines and reduce vendor lock-in. Polaris Catalog can be hosted on Snowflake’s AI Data Cloud or self-hosted, offering flexibility and robust governance features.
To use it, create an integration object in Snowflake that connects to the Polaris Catalog and sync Iceberg tables with Snowflake Horizon.
Internal Marketplace (private preview):
This will enable secure collaboration with a single directory of data products curated within organizations.
Advanced analytics:
Snowflake announced that Time Series ASOF JOIN is now GA. ASOF JOIN is a type of join that pairs a record from two tables based on their proximity (usually temporal). For each row on the left side of the join, the operation finds the closest matching value from the right side.
Other analytics features are Time Series RANGE BETWEEN (public preview soon) and Higher-order Functions (GA).
Snowflake Performance Index: Recurring Query duration improvements of 27%:
Snowflake using their internal data observed that query duration for their customers’ stable workloads improved by 27% from August 25, 2022 to April 30, 2024. They used a group of customer workloads that are stable and comparable in both number of queries and data processed over the period stated. This reduction in query duration was because of many factors, including hardware and software improvements and customer optimizations.
Better DevOps Capabilities:
GIT Integration now in public preview, improved Database Change Management, Snowflake Notebook now in public preview, Snowflake CLI. You can schedule Notebooks and use container runtime with it.
Snowpark Pandas API in public preview:
For ML operations and ML developments.
The API enables running pandas code directly on Snowflake data. You get the same pandas-native experience and the advantage of parallelization and the data governance and security benefits of Snowflake. Snowflake’s backend optimizations ensure that Pandas operations are executed efficiently, often outperforming traditional methods. The Snowflake Python API unifies all Snowflake Python libraries (including connector, core, Snowpark, ML, and more) Read more here.
Conclusion
With these features, we can see that Snowflake’s mission is to be that enterprise AI Data Cloud on a single platform that helps users to run development tasks (building apps, pipelines, ML models) right alongside your data.
They are also evolving fast to become more serverless as the nature of the technologies of the AI features requires it. The features they present to govern and discover data with a built-In, unified Set of capabilities in the AI Data Cloud is commendable.
Ready to Transform Your Data Strategy?
Contact ProCogia today to learn how our expertise with Snowflake can drive your business forward. Schedule a consultation with our team of data experts and see how we can help you leverage the latest advancements to optimize your data operations and reduce costs.