Capturing Snowpark Logs in Snowflake’s Event Tables

Table of Contents

Sign up for our newsletter

We care about the protection of your data. Read our Privacy Policy.

A futuristic data center environment filled with rows of server racks and multiple monitors displaying lines of code and data flow diagrams. The room is illuminated by blue and white lights, creating a high-tech atmosphere. A large foreground monitor shows a detailed 3D model of Snowflake's Event Table, surrounded by connecting lines and nodes, symbolizing data integration and management. The setting conveys cutting-edge technology used in advanced data processing and analytics.

Introduction 

In October 2023, Snowflake announced the general availability of a great feature that can track the activity of functions and stored procedures by capturing the log messages and trace events directly from a specified code whenever it executes. 

These logs and traces are seamlessly routed to a secure, customer-owned Event Table which you can query using SQL to troubleshoot your applications or gain insights on performance and behavior of your code. The Event Table can also be used in conjunction with other features like Snowflake Alerts and Email Notifications to be notified of new events and errors in running applications. 

According to the supporting documentation it can analyze the following: 

  • Log messages that are independent, detailed messages with information about the state of a specific piece of your code. 
  • Trace events with structured data you can use to get information spanning and grouping multiple parts of your code. 

With this recent feature, data engineers and developers will be able to easily structure their code to capture logs and traces across UDFs, UDTFs, Native Apps, SnowServices and Stored Procedures for all languages: Java, Scala, JavaScript, Python and Snowflake Scripting. The collected data are automatically ingested into the Snowflake Event Table with no manual intervention. 

Event Table

Snowflake uses a serverless compute model for ingesting events into the Event Table. Costs for this include compute resources and cloud services, billed separately.  

Usage can be monitored via the EVENT_USAGE_HISTORY View. To minimize costs, avoid logging frequently over a long period of time and set the log level for specific functions or procedures instead of globally.

Event Table Logging Use Cases 

The most common use case of this feature is monitoring and observability. These are the most important areas that many developers are concerned about when building and managing applications effectively.  

Before the introduction of logging and tracing via Event Tables in Snowflake, there was no easy way to capture application logs and traces. There was also no centralized location to store and consume these logs and traces. 

In one of ProCogia client’s projects, there was a business requirement to develop a dynamic process to handle invalid records ingestion automatically without manual intervention. Snowflake Event table was employed for this Data Validation process. 

To help detect data quality issues during the data engineering process, we implemented validation checks against incoming data. The nature of these checks is to help address data errors such as a start time that comes after an end time for a maritime voyage. Or an impossible value for a field, such as a negative value for an engine “running hours” field.

Design Approach 

A simple Python script in Snowpark flattens the raw JSON files, identifying and excluding the invalid records, while automatically logging the details in an event table. Also, an alert mechanism was implemented to monitor and send email notifications whenever such flawed records are detected.

SNOWPARK

To achieve this, we scheduled an automated snowflake task to COPY Json files into RAW table. We implemented a Procedure in Snowpark that flattens the JSON data as per business requirements. The Procedure has been programmed to identify erroneous records and log them into an Event table. Finally, an alert was configured to notify via email when there is a specific occurrence in the event table as per the pre-defined scopes. 

An example of a validation instruction may be defined as follows:

 

				
					use session.sql("select *, count(*) AS num_errors FROM events WHERE StartTime > EndTime")
				
			

The code will execute this SQL and expect a count of zero if no errors were detected. In the case that the count is > 0 in the dataframe that is generated, the code will log the issue. The intent is that these errors are logged in Snowflake Event table.

Comparison to AWS and AZURE Services. 

Snowflake’s feature for capturing logs and traces via Event Tables can compare with AWS and Azure tools and services for logging, monitoring, and observability. It works just like AWS CloudWatch Logs, Azure monitor, Azure Application Insights and Azure Log analytics that allows you to capture and monitor logs and analyze the performance of applications from various cloud resources. 

As opposed to other comparable services in other cloud environment, Snowflake Event Table is tightly coupled with the Snowflake Data Cloud, and it is not a separate service that needs to be setup. 

Snowflake Event Table Columns. 

An Event Table uses its natively predefined set of columns to record events, logs, and trace events from the user defined functions and stored procedures. There can only be one active Event Table for an account.  

After the account admin creates the event table, severity level for log messages and verbosity of trace events should be set. It is important to note that you can also set logging at the account, database, schema, or function level.

				
					 
use role accountadmin; 
 
create event table ALL_EVENTS; 
 
alter account set event_table = <db_name>.<schema_name>.ALL_EVENTS; 
 
alter session set log_level = WARN;  
 
alter session set trace_level = ON_EVENT;
				
			

The event table has several columns, but listed below are some of the most important columns for the logging activities.

Column 

Data Type 

Description 

TIMESTAMP 

TIMESTAMP_NTZ 

The UTC timestamp when an event was created. For events representing a span of time, this is the end of the time span. 

TRACE 

OBJECT 

Tracing context for all signal types. Contains string values trace_id and span_id. 

SCOPE 

OBJECT 

Scopes for events. For example, class names for logs. 

RECORD_TYPE 

STRING 

The event types. One of the following: 

  • LOG for a log message. 
  • SPAN for user-defined function invocations performed sequentially on the same thread.  
  • SPAN_EVENT for a single trace event. A single query can emit more than one SPAN_EVENT. 

VALUE 

VARIANT 

Primary event value. 

 

Passing Logs Through SNOWPARK to Event Table 

With the event table set, just passing logging information through a block of code will not work directly. 

To pass logging information to the EVENT TABLE in snowflake which can be queried later, there are a few things to note. 

  • The logging activities will only be passed to Snowflake if your code is executed through a store procedure. So, create a procedure (either temporary or permanent), then call the procedure. 
  • The log messages must be explicitly stated, meaning we must properly catch exceptions and log them.  
  • It may take a few minutes for the logs to appear in the Snowflake Event Table.

 

Let us take an example below.  

We assume that the libraries needed are imported.

Passing logs through SNOWPARK to Event table

Passing logs through SNOWPARK to Event table

Let us now use the same example with an error on line 24 and catch the errors.

Passing logs through SNOWPARK to Event table with error

Passing logs through SNOWPARK to Event table with error

Additionally, the same USER may join the EVENT TABLE with the INFORMATION_SCHEMA to extract the actual query text. 

				
					with cte as ( 
Select  TIMESTAMP,  
RESOURCE_ATTRIBUTES['snow.query.id']::text AS id,  
VALUE 
FROM ALL_EVENTS  
WHERE SCOPE['name'] = 'infosec_glue' 
  	) 
Select cte.*, qh.QUERY_TEXT, qh.query_type,qh.user_name, qh.execution_status, qh.error_message from cte 
join  
TABLE(INFORMATION_SCHEMA.QUERY_HISTORY_BY_USER(RESULT_LIMIT => 1000)) qh 
on qh.query_id = cte.id;
				
			

This is one such example of how we can take advantage of the event table within the Snowflake platform.

Conclusion 

In Snowflake, you can effectively store your log and trace data in an event table, a special kind of table with a predefined set of columns. You can record the activity of your Snowflake function and procedure handler code (including code you write using Snowpark APIs) as it executes. 

Once you’ve created an event table and associated it with your account, you use an API in the language of your handler to emit log messages from handler code. These captured log and trace data can be very useful in your data engineering process, and you can query the data to analyze the results.

Interested in enhancing your data management strategies? Learn more about our services and ProCogia’s partnership with Snowflake. Contact us today to discover how we can help you leverage the power of Snowflake’s Event Tables to elevate your data solutions.

Author

Keep reading

Dig deeper into data development by browsing our blogs…
ProCogia would love to help you tackle the problems highlighted above. Let’s have a conversation! Fill in the form below or click here to schedule a meeting.