Connect Snowflake with Alteryx

MAY CONTAIN BAD ASS CONTENT: ANALYST DISCRETION IS ADVISED

Connect Snowflake with Alteryx

James Smith • August 13, 2020

Alteryx + Snowflake = Awesomeness

The flexibility and scale of Snowflake enables any user to bust through any misconceptions about data availability or data size. However, it does require a good deal of SQL knowledge to get insights and is more suited to engineers that analysts as direct users. When paired with Alteryx, the power of the Snowflake data engine plus all of the data sets in the Snowflake marketplace become instantly accessible in an intuitive drag and drop interface.

Getting your PC ready

I promise this will be the last admin step needed to achieve unlimited data superpowers. It is annoying I admit, but a small step compared to what you will achieve. The snowflake drivers do not come standard with windows machines so they have to be downloaded and installed. Note this process as to be repeated on all machines that will connect to snowflake (including server if you are lucky enough to have alteryx server)

The link is below (for a windows 64 bit machine). If you are using a 32-bit machine, please read the advisory on the main page and start a gofundme account to try and score a new laptop before attempting to continue. If you are using linux then I am certain you are not reading this step-by-step guide. If you are using a mac, you can afford to pay someone to do this for you and cant use Alteryx anyway (which I understand is a heated debate but best taken up elsewhere)

https://sfc-repo.snowflakecomputing.com/odbc/win64/latest/index.html

Making the connection

You can connect Alteryx to databases using both the regular input tool or the in database tool. For simplicity, I am focusing on the in-DB tools. The reality is the in-DB can easily perform the functions of the regular tool plus much more, so I recommend using them always when connecting to databases. Under in-database, click through to connect In-DB. Under datasource, you should see Snowflake as an option (if not review the first step).

File or User connection?

Decision time. Windows (and therefore Alteryx) allows for 2 types of ODBC connections, one is user based (where the setup is stored with the user of the machine) and the other is file based (where the credentials are stored in a file that can be shared). Personally, I like the file option because it's easily transportable (to a server for instance) or to share with a fellow analyst. However, this may give your security team the total creeps, so admittedly, the more secure way is a user based connection. Either way, choose your option. If you choose the file based method, you will need to specify a folder where the connection details will go.

This being Windows there are...well..a lot of windows. You will need to create your machine based connection to Snowflake first.

From here you need to enter in your parameters. There are a few gotchas here to understand.

1) Datasource - this is the name of the datasource (eg. SNOWFLAKE_CENSUS)

2) User/password - for your Snowflake account

3) Server - this is your Snowflake URL - WITHOUT the https://

4) Database, Schema - these are the DEFAULT locations when you sign-in. These can be changed but if not explicitly changed in a workflow will default here.

5) Warehouse - this is the compute resource this connection will use. This can't be changed (easily anyway) so I highly recommend you use a warehouse that is always on extra-small. If someone increases its size and you are happily blasting away with Alteryx you can get some bill surprises.

6) Role. This also can't easily be changed. I strongly suggest you don't use accountadmin for this unless this is your own personal playground-then you own the toys play with them as you wish :)

7) Tracing. This can be blank but I recommend 4 (minimal logging).

8) Authenticator/Proxy/No-Proxy - Leave this all blank. If you are using multi-factor and/or proxy servers lets hope you are not following this step by step guide. (If you are then email us and we can certainly help with setting this up).

Now, you will be asked to configure a write driver. THIS IS IMPORTANT. WAKE UP NOW!

If you plan on writing little or no data to Snowflake from Alteryx, then select same as read driver here. When I say little I typically mean 1000 rows at a time or less. This is because standard ODBC drivers insert into databases one row at a time. This is super inefficient even for Snowflake. If you do same as read driver and later decide you want to write 1,000,000 rows into snowflake with Alteryx you are

a) GOING TO BURN LOADS AND LOADS OF CASH

b) GOING TO THINK SNOWFLAKE IS HORRIBLE AND REGRET READING THIS BLOG

c) WILL GO BACK TO EXCEL AND LOSE ALL HOPE OF A CAREER IN DATA

If you do want to load lots of data into Snowflake fear not and use the bulk loading option. At the time of this writing, bulk loading requires an s3 bucket and account. However, there are alternative options in late stage beta and Demand Data has specifically built loaders for Azure and GCP (contact us for interest). Setting up the s3 loader in self-explanatory but a little tricky. If you get stuck, contact us and we can help.

**TECHNICAL ASIDE**** Read if you care. The bulk loader works by zipping your data into chunks and compressing it and uploading it into a temporary storage and then triggering Snowflake to read this data directly into the engine. It is amazingly efficient How efficient? We can load from Alteryx 1,000,000 rows (1GB) in under 2 minutes. Using the standard loader, this would take 24 hours or more.

PHEW - Now you are done with that. You should be able to pull up a list of tables and connect to one using standard SQL or the visual query builder.

< Older Post Newer Post >

Alter Snowflake with Dynamic SQL and Python

December 17, 2020

Anyone who follows me knows how much I love the combination of Alteryx and Snowflake. Alteryx provides a logic layer that enables any analyst to manipulate data and Snowflake provides the scale needed to get some amazing insights across large data sets. However, I have found there were always a few things missing. The Alteryx In-DB tools, while comprehensive, cannot do everything. There are some gaps (most notably - ahem - any Alteryx product managers reading this - simple Pivot or Unpivot tools in-DB). There is also a lot of functionality like changing warehouses, schemas or databases that cannot be easily done from within Alteryx.

Why You Need Real Time Right Now

November 2, 2020

In our previous post, we discussed why d emand forecasting matters and how an accurate demand forecast can mathematically and quantitatively improve your service level and/or inventory levels (cost). Now it is time to focus on the biggest uncertainty of them all . Time. Advanced forecasting algorithms can help eliminate uncertainty through reducing variables that are “unknown” by understanding relationships at a scale that no human could comprehend. This helps turn the seemingly “unknown” in to a “known”. As we stated earlier, advances in math and cloud computing make finding these relationships much easier and can be computed almost instantaneously. However, the element of time is a tricky compounding factor. Even little unknowns become big over time as time compounds all errors and introduces more uncertainties. The most certainty you have is in the precise moment you make an estimate. When you look ahead 1 day, 1 week or even 1-month (as most corporate S&OP) cycles still do – then you must start to rely less and less on facts and more and more on assumptions. Assumptions mean uncertainty and uncertainty means cost. There are loads of examples of this. When I lived in Florida, every September during hurricane season , we would get the dreaded cone of uncertainty any time a hurricane formed. As below shows , the further out you get, the expected position gets wider. By day 5 - the uncertainty stretched over 500 miles – hardly accurate enough to do any planning of significance .

In Supply Chain, Maths is cool again

By James Smith • September 14, 2020

You cannot go far these days without hearing about data science, digital transformation or AI. This is especially true in the supply chain discipline. However, when taking it back to basics, what AI, ML and Data Science are really doing are solving mathematical equations that are hardly new. These equations are just trying to optimize a series of variables to minimize cost and maximize service – something supply chain professionals have been doing for decades. So what has changed? Cloud Computing and Cloud Storage! Cloud Computing is enabling supply chain professionals to perform more math, faster. Cheap Storage is providing the ability to store more data about products and environmental variables which gives more fuel for the math to solve.