Snowpark¶
DataFrametransformations are translated to SQL and run on all nodes.df.collect()ordf.to_pandas()run on a single node- Multi-threading
- Logging, Java create a static instance
sl4jof logger; for python uselogging.getLogger()
Create UD(T)F and SP¶
- UDF can be either anonymous or named. For named UDF, supply
name=<udf_name>parameter value- named UDF are accessible in the same session
- define using decorators
udf,udtf,udaforsproc - for permanent UDF, use
is_permanent = True(defaultFalse) and provide value forstage_location; temporary UDFs useSession.get_session_stage() - UD(T)F registration can supply
importsandpackages. These values will be used to override session level values when executing functions - define by:
- add package dependencies, either
- for all UDFs using
session.add_packages - or, specific for each UDF using
@udf(packages=...)decorator (overrides any packages added bysession.add_package) - snowpark library is not automatically uploaded
- for all UDFs using
- optionally, add user code/data files using
session.add_import- referencing local files are allowed (automatically uploaded as part of execution)
- add package dependencies, either
- examples
import pandas as pd import snowflake.snowpark import xgboost as xgb from snowflake.snowpark.functions import sproc @sproc(packages=["snowflake-snowpark-python", "pandas", "xgboost==1.5.0"]) def compute(session: snowflake.snowpark.Session) -> list: return [pd.__version__, xgb.__version__] # register a permanent, named stored-proc @sproc(name="minus_one", is_permanent=True, stage_location="@my_stage", replace=True, packages=["snowflake-snowpark-python"]) def minus_one(session: snowflake.snowpark.Session, x: int) -> int: return session.sql(f"select {x} - 1").collect()[0][0]