atomic.eventsare deprecated (there’s now a better way)
The amazing analytics unconference MeasureCamp Sydney was held on Saturday 20 October and it was the best yet! We’ve been involved from the beginning as sponsors and organisers because we love the format and community, so it was great to be part of the third edition.
Mistakes happen. In the data world, your ugly mistakes live on forever. It’s not just the embarrassment that’s a problem though. Gaps and obvious errors in historical data distract your stakeholders from more important matters. Explaining the anomalies and getting your data users to focus on things you don’t know about is tiring for everyone.
Adam Greco is something of a legend in the Adobe Analytics space. I’ve been reading his blog posts and learning from him since I first started using Omniture back in 2007 or so. He literally wrote the book on Omniture and then Adobe SiteCatalyst. The reason his blog was so useful is that very few people were writing about advanced digital analytics at the time. Between Adam and Ben Gaines, I learnt much of what I know about the then-emerging discipline.
In the first four parts of this series, we modelled out:
In the first three parts of this series, we looked at modelling out pageview events to include accurate time spent and scroll depth, then classifying sessions based on what we know about where the user came from. Now we’re going to look at what we know about users.
In the first two parts of this series, we looked at modelling out pageview events to include accurate time spent and scroll depths. Now we’ll roll up sessions.
In the first part of this series on data modelling we went through the background for building a data model. In this edition we’ll go through the steps to create a basic pageview model that incorporates page pings so we can see accurate time spent and scroll depth for each pageview.
One of the important features of Snowplow is that you can build your own custom data models to suit your unique analytical requirements in a space-efficient and performant way. It’s important that your data model can evolve and grow in complexity as your business grows and your needs get more advanced.
Snowplow usage continues to go up and to the right, with new and interesting use cases proliferating. Alongside this we’ve seen a steady increase in usage of our debug tool, Snowplow Inspector.
Download the JDBC driver from AWS and place it in
the DataGrip JDBC driver directory. On Linux this was
Last Saturday Mike and I made it across to the first MeasureCamp Auckland. We’ve sponsored all the previous events in Sydney and Melbourne so we were thrilled to help out the Auckland team, who did a fantastic job with a really good venue, excellent food and a really diverse crowd keen to get into it.
In case you’ve been living in a cave, the EU’s General Data Protection Regulation came into force on Friday last week. This is Europe’s latest crack at regulating consumer privacy in the digital era, and it’s definitely a very pro-consumer regulation. It’s going to have some big impacts on the business models of a bunch of businesses.
DataGrip is one of the most valuable tools for our engineers for exploring and querying a myriad of different database technologies. DataGrip doesn’t yet come bundled with a BigQuery driver so in this post we’ll explore how to setup a custom data source so that you can connect to BigQuery using DataGrip.
A few weeks back Nicholas Tan gave a presentation at the Future of Financial Services conference about architectural designs in the real world to get value from data. Nick most recently was responsible for News Corp’s large-scale Snowplow Analytics rollout and has just started at Macquarie Group. Check out his presentation.
Back in July, we did a bunch of work to quantify the benefits of ZSTD in the Redshift database, resulting in this blog post from Mike. The results were a clear, and massive with at least 50% reductions in storage use, improvement in nearly all use cases. We started migrating our customers to using ZSTD wherever possible so they could benefit from this huge improvement.
Snowflake Analytics sponsored the always awesome Measurecamp Sydney unconference last weekend. As usual it was an incredibly high-value event with really great sessions and even better informal chats between the sessions. Such a great event and we’re looking forward to Melbourne early next year.
A few weeks ago I discoverd Monica Rugati’s fantastic Data Science Hierarchy of Needs. It’s a data science-centric riff on Maslow’s Hierarchy of Needs, a classic concept in pyschology. I’ve found myself using Rugati’s diagram and the concept in conversations with colleagues, partners, customers and friends ever since, as a way to explain the challenges we face in this Digital Analytics space.
We are delighted to announce Conrad Yiu has joined Snowflake Analytics as an Advisory Board member. Conrad brings over 20+ years of experience as an entrepreneur, venture investor and business builder.
Tonight Snowflake Analytics team members Mike and Narbeh are debating the merits of Snowplow Analytics with representatives of Google Analytics and Adobe Analytics at Web Analytics Wednesday. The head-to-head aspect of it is meant to be lighthearted, but it’s forced us to think about some of the ways Snowplow Analytics is a better match for many types of digital analytics problems.
A new compression option in Redshift allows you to make big storage savings, up to two-thirds in our tests, over the standard Snowplow setup. This guide shows how it works and how to get it happening.
Snowplow Insights is an amazingly flexible way to collect data, but with great flexibility comes some complexity. If you work on Snowplow implementations a lot, you’re likely familiar with Base 64 Decode and JSON Formatter when you’re digging into custom contexts and testing your implementation.
Web analytics tools commonly have a Time Spent metric. Understanding how long people have spent reading a page is a really valuable thing for some businesses. For publishers, the quality of engagement with content is vital, given they’re effectively selling the attention of their readers.
In this tutorial we’ll look at decoding the bad rows data that comes out of Snowplow real time. In the real time pipeline bad rows that are inserted into Elasticsearch (and S3) are stored as base64’d binary serialized Thrift records. We’ll walk step by step the instructions in Python as to how to first decode, and then deserialize these records.
In this tutorial we’ll use Amazon Lambda and Amazon Cloudwatch to set up monitoring for the number of bad rows that are inserted into Elasticsearch over a period of time. This allows us to set an alert for the threshold of bad rows, and generates an email or notification when this threshold has been exceeded. Snowplow users on the realtime pipeline will find this most useful, however users running loads in batch can also adapt this monitoring.
Subscribe to this blog via RSS.