Introduction
In October 2023, Snowflake announced the general availability of a great feature that can track the activity of functions and stored procedures by capturing the log messages and trace events directly from a specified code whenever it executes.
These logs and traces are seamlessly routed to a secure, customer-owned Event Table which you can query using SQL to troubleshoot your applications or gain insights on performance and behavior of your code. The Event Table can also be used in conjunction with other features like Snowflake Alerts and Email Notifications to be notified of new events and errors in running applications.
According to the supporting documentation it can analyze the following:
- Log messages that are independent, detailed messages with information about the state of a specific piece of your code.
- Trace events with structured data you can use to get information spanning and grouping multiple parts of your code.
With this recent feature, data engineers and developers will be able to easily structure their code to capture logs and traces across UDFs, UDTFs, Native Apps, SnowServices and Stored Procedures for all languages: Java, Scala, JavaScript, Python and Snowflake Scripting. The collected data are automatically ingested into the Snowflake Event Table with no manual intervention.
Snowflake uses a serverless compute model for ingesting events into the Event Table. Costs for this include compute resources and cloud services, billed separately.
Usage can be monitored via the EVENT_USAGE_HISTORY View. To minimize costs, avoid logging frequently over a long period of time and set the log level for specific functions or procedures instead of globally.
Event Table Logging Use Cases
The most common use case of this feature is monitoring and observability. These are the most important areas that many developers are concerned about when building and managing applications effectively.
Before the introduction of logging and tracing via Event Tables in Snowflake, there was no easy way to capture application logs and traces. There was also no centralized location to store and consume these logs and traces.
In one of ProCogia client’s projects, there was a business requirement to develop a dynamic process to handle invalid records ingestion automatically without manual intervention. Snowflake Event table was employed for this Data Validation process.
To help detect data quality issues during the data engineering process, we implemented validation checks against incoming data. The nature of these checks is to help address data errors such as a start time that comes after an end time for a maritime voyage. Or an impossible value for a field, such as a negative value for an engine “running hours” field.
Design Approach
A simple Python script in Snowpark flattens the raw JSON files, identifying and excluding the invalid records, while automatically logging the details in an event table. Also, an alert mechanism was implemented to monitor and send email notifications whenever such flawed records are detected.
To achieve this, we scheduled an automated snowflake task to COPY Json files into RAW table. We implemented a Procedure in Snowpark that flattens the JSON data as per business requirements. The Procedure has been programmed to identify erroneous records and log them into an Event table. Finally, an alert was configured to notify via email when there is a specific occurrence in the event table as per the pre-defined scopes.
An example of a validation instruction may be defined as follows:
use session.sql("select *, count(*) AS num_errors FROM events WHERE StartTime > EndTime")
The code will execute this SQL and expect a count of zero if no errors were detected. In the case that the count is > 0 in the dataframe that is generated, the code will log the issue. The intent is that these errors are logged in Snowflake Event table.
Comparison to AWS and AZURE Services.
Snowflake’s feature for capturing logs and traces via Event Tables can compare with AWS and Azure tools and services for logging, monitoring, and observability. It works just like AWS CloudWatch Logs, Azure monitor, Azure Application Insights and Azure Log analytics that allows you to capture and monitor logs and analyze the performance of applications from various cloud resources.
As opposed to other comparable services in other cloud environment, Snowflake Event Table is tightly coupled with the Snowflake Data Cloud, and it is not a separate service that needs to be setup.
Snowflake Event Table Columns.
An Event Table uses its natively predefined set of columns to record events, logs, and trace events from the user defined functions and stored procedures. There can only be one active Event Table for an account.
After the account admin creates the event table, severity level for log messages and verbosity of trace events should be set. It is important to note that you can also set logging at the account, database, schema, or function level.
use role accountadmin;
create event table ALL_EVENTS;
alter account set event_table = ..ALL_EVENTS;
alter session set log_level = WARN;
alter session set trace_level = ON_EVENT;
The event table has several columns, but listed below are some of the most important columns for the logging activities.
Column | Data Type | Description |
TIMESTAMP | TIMESTAMP_NTZ | The UTC timestamp when an event was created. For events representing a span of time, this is the end of the time span. |
TRACE | OBJECT | Tracing context for all signal types. Contains string values trace_id and span_id. |
SCOPE | OBJECT | Scopes for events. For example, class names for logs. |
RECORD_TYPE | STRING | The event types. One of the following:
|
VALUE | VARIANT | Primary event value.
|
Passing Logs Through SNOWPARK to Event Table
With the event table set, just passing logging information through a block of code will not work directly.
To pass logging information to the EVENT TABLE in snowflake which can be queried later, there are a few things to note.
- The logging activities will only be passed to Snowflake if your code is executed through a store procedure. So, create a procedure (either temporary or permanent), then call the procedure.
- The log messages must be explicitly stated, meaning we must properly catch exceptions and log them.
- It may take a few minutes for the logs to appear in the Snowflake Event Table.
Let us take an example below.
We assume that the libraries needed are imported.
Let us now use the same example with an error on line 24 and catch the errors.
Additionally, the same USER may join the EVENT TABLE with the INFORMATION_SCHEMA to extract the actual query text.
with cte as (
Select TIMESTAMP,
RESOURCE_ATTRIBUTES['snow.query.id']::text AS id,
VALUE
FROM ALL_EVENTS
WHERE SCOPE['name'] = 'infosec_glue'
)
Select cte.*, qh.QUERY_TEXT, qh.query_type,qh.user_name, qh.execution_status, qh.error_message from cte
join
TABLE(INFORMATION_SCHEMA.QUERY_HISTORY_BY_USER(RESULT_LIMIT => 1000)) qh
on qh.query_id = cte.id;
This is one such example of how we can take advantage of the event table within the Snowflake platform.
Conclusion
In Snowflake, you can effectively store your log and trace data in an event table, a special kind of table with a predefined set of columns. You can record the activity of your Snowflake function and procedure handler code (including code you write using Snowpark APIs) as it executes.
Once you’ve created an event table and associated it with your account, you use an API in the language of your handler to emit log messages from handler code. These captured log and trace data can be very useful in your data engineering process, and you can query the data to analyze the results.
Interested in enhancing your data management strategies? Learn more about our services and ProCogia’s partnership with Snowflake. Contact us today to discover how we can help you leverage the power of Snowflake’s Event Tables to elevate your data solutions.