Skip to main content

XML Parsing for Beginners with Alteryx

Recently I’ve come across using XML parsing with web scraping and combining many XML tables. Because of its syntax, I wanted to outline some basic rules about XML and how to work with XML files within Alteryx.

XML files are different from flat columnar tables (the ones we’re used to!) because instead of headers and rows, the data is nested within tags, where the field headers are identified for every record with <> brackets. Below is the example from  Reading XML by Alteryx where the same data is presented as a columnar table and the other in XML. Because of its nested nature, it needs a bit of digging/examining the original data to figure out which tags to target and parse.

Example 1: Parsing from 1 XML file

Let’s go through one recent example I was working on for a client – parsing clinical trial data from XML files. Let’s say I wanted to 2 pieces of information – the name of the trial and when did it start. I like to take a look at the XML file to see where is the information located so I can figure out the best way to parse it.

I’ve opened the file in notepad++ and highlighted the tags which show which pieces of data I am looking for. Note that the root element in this file is “clinical study”, and it just so happens that <brief_title> and <start _date> are child values.

What happens when I select this file in Alteryx’s input tool?

Alteryx’s default settings is to pick up only 1 set of the file’s child values, so there’s a lot of information not yet coming through.

I can bring the information we are trying to parse into Alteryx by selecting “Return Root Element” and  Return Child Values” as the information we are looking for is only 1 layer beneath the Root Element.

Simply add a select tool for the fields we wanted to parse and voila!

But what if I want to parse data that is nested deeper within the XML file?

What if you wanted to find who sponsored the clinical trial?

The data is nested within the <agency> tags, so its not a child value of the root element. This is where can use the Outer XML configuration to bring in all XML tags within the root element. From there, we can parse out the <agency> tags with the ..

XML PARSE TOOL! XML PARSE TOOL!

Simply select “Return Root Element”  and “Return Outer XML” in the input tool so Alteryx can identify both the root element and its respective nested tags.

Then add the XML Parse tool and configure the tool to look for the <agency> tag within the new field “clinical_study_OuterXML”.

I’ve got the results window above with the <agency> data parsed!

XML parsing requires a bit of evaluating the dataset and identifying where your data is located. Because of its nested nature, you might have quite of a few XML parse tools in your workflow. But the same concepts can be taken into web scraping with Alteryx so learning XML and its tagging system has transferable value!

I hope this has been a clear introduction to working with XML files. Let me know if you have any questions in the comments section below.

LODs of Fun: How to Filter Moving Up a Level of Granularity

In my spare time, I’m working on a viz on global immunization rates for 1 year olds from the WHO. The dataset has 2 levels of geography – country and region.

This is quite simple to do in a “top-down” scenario (for instance, region to country), but how do we get this to work the other way around?

What if I wanted to select a country and see what other countries are in the same region? Specifically, for the map to zoom in on its respective region?

First off, there are a couple of clues we can pick up from this problem:

1) We can’t use a filter on country, because it will obviously filter to the selected country.

Rather we need to filter to 1 level above this, region. In my mind, I’m already sensing we’ll have to do an “if” statement of sorts.

2) Since we cannot create a filter and will have to use the if statement, we’ll be using a parameter.

Big thank you to Craig Bloodworth for helping me work through this problem 🙂 

How do we get this action to work?

Step 1: Create a parameter that determines which region was selected.

How to create a parameter:

Step 2: Create the calculation which drives the parameter:

Here’s a close up view of the calculation:

In this instance, when we select a country in the parameter, the IF statement either returns Null or the Region. We can see that working in this “sanity check” table.

You might be wondering why we need the MAX() function. Indeed, if we were to take out the max function we would get the same result in the table. Why even use it to begin with?

The reason why we cannot directly use a calculation where we specify the country, but ask it to return its respective Region is because it will only bring back data on the Region related to it, almost thinking it is the only stance of this Region. i.e. If we only select “Canada”, it will only retrieve regions where country=”Canada”. Since this is at a lower level of granularity, it will only bring back 1 stance of “America” within Region, instead of all instances of “America”.

Therefore, we need to create a scenario where the view is supplemented with extra context, achieved through a FIXED level of detail calculation.

However, Level of Detail calculations can only be done on aggregations and fundamentally, it has to be a measure. That means in order for us to access the ability to use a LOD calculation, we need to turn our parameter control function into an aggregation – our MAX() function. We will build the LOD calc in the next step.

This is the first time I’ve used a traditional dimension and processed it as a measure and aggregation. Totally turned my head upside down!

Step 3: Creating the filter calculation for 1 level above the detail in the view

Our first instinct here might be to create this calculation.

But of course it doesn’t work because our selected region is still an aggregation thanks to our MAX() function. We can remove the aggregation with the { } brackets, creating a FIXED LOD calculation. Remember it has to be fixed in order for the the data to go past the filter. Yay to the order of operations!

Correct calculation

Bring the calculation to filter and voila! The functionality will start working in the table.

It looks even better with a map 🙂

lods

Happy Vizzing!

New Viz: How has the World Responded to the Refugee Crisis?

Whipped this together between my time on the train to work and in my spare time. Quite proud to have figured out how to get around this “can’t sort on a table calc” business with an LOD.  Probably one of the simplest pieces of design I’ve built, but the data speaks for its self.

Check out how your country has responded to the refugee crisis. While I am Canadian, I was definitely surprised to see how the US responded.

Making Sankey Charts with Tableau and Alteryx – Download the Template to Make Your Own

Here’s my first step to making this a scalable viz for anyone interested in building this without all the legwork.

There’s a lot of demand to scale such a viz so I thought I would put my 2 cents in. It only works for visualizations with 5 stages and where each unique ID moves from dimension to dimension (rather than the flow of measures) –  but still a start!

Why 5 Stages?

I’ve chosen to release the first iteration with 5 stages because of its potential application for sales pipelines: lead, prospect, qualified prospect, committed and transacted. Apply this to your Salesforce data to visualize common customer paths and areas of improvement.

With this method, I built this beautiful Sankey below outlining a friends’ job search.


How do you get your hands on such treasures? You’ll need Alteryx Designer and Tableau Desktop (both which offer trial versions in case you are new to either software.)

Step 1: Download my Alteryx workflow from the Alteryx Gallery. 

Run the Alteryx workflow with your dataset, selecting which fields are in Stage 1 through 5. Then set the where you’ll want your .tde to save to on your machine. Note that this method only works for 5 Stage Sankey charts because the table calculations will break otherwise. Also your stages need to be field values before running the Alteryx workflow.

Step 2:  Download the template Tableau workbook

Once downloaded, switch the .tde connection with the new .tde you’ve produced from the workflow.

Here’s where you’ll need to click to switch the .tde files.

Once the switch is made, your viz should magically appear! Just change the alias’s to match your desired results and you’ve got a Sankey chart in 10 minutes!

Let me know what you think of the workflow/workbook or anything else in the post. More than happy to clarify if there are questions. I’m hoping this will be useful to at least someone on the interwebs!

How to Instantly Make Your Vizzes Look Amazing

Use 1 Color for your entire viz

One way to instantly give your viz an upgrade is to change the color palette to be 1 color, and everything else to be a gradient of 1 color. It’s a pretty common piece of advice from Andy Kriebel and Cole Nussbaumer and no doubt it’s effective.

I tried it out here with this viz while I was at the Data School:

However, I immediately found roadblocks with this piece of advice.

What happens when I want to add multiple vizzes to a dashboard– what of those other dimensions? What happens when I want to show good/bad performance, usually encoded with contrasting colors? I can’t use red/green because its not visual best practice, but how many vizzes of blue and orange can one take? As for different dimensions, maybe lighter colors for certain dimensions and darker for the remaining.. but I’m not particularly happy with that solution. It also doesn’t help that I don’t come from a design background and feel hindered at a lack of a sophisticated solution.

Then I found this AMAZING artist – Dave Thompson.

I picked up postcards by him from my local Paperchase and was immediately drawn to the color choices. Its perfect for data visualization because it’s so soothing to look at, yet distinctive even between related dimensions.

I finally found a color inspiration guide for dashboards –  different vizzes with multiple dimensions.

1 color family never made sense to me within this context– people just get confused… here I’ve finally found an instruction guide on how to make that work. You can check out my first viz (a makeover) inspired by the color theory in this piece of his:

My makeover of a NYT article on college majors likely to move back home post graduation. Click on the gif to go to the viz.

The feedback on this viz has been great – even got this back from the guy at TIME.

I’m still going to learn more about color theory and elaborate on this (and hopefully I’ll start to develop my own sense of what works best for UI and story).  For now, I’m gonna be using these training wheels until I get up to speed.

Did you find this useful?

Let me know if this helps you too! It’s my first blog on my own this data viz. Hope you’ll follow along 😀

Emily