November 7th and 8th 2013

Kinépolis Madrid, Spain



We lining up some of the most relevant industry leaders in Big Data for keynote sessions. The time limit to submit proposals is over so we will soon reveal the definitive list of speakers.

BigQuery over a Bitcoin dataset - workshop

This workshop combines the interest of the Bitcoin data with the power of Google's solutions. Google BigQuery is the cloud tool and API that allows data explorers to focus on what ultimately matters: The data and its possibilities. In this workshop we'll get hands-on experience focused on datasets ripe for exploring. We'll quickly present the basic building blocks for participants to ask questions that might have never been asked before - and get answers in mere seconds.

Paradigma's Alberto Gómez Toribio @gotoalberto helped compile a dataset of Bitcoin's blockchain never released before.

NOTICE: Google BigQuery has a free monthly quota for querying you'll be able to use at the workshop. BigDataSpain will provide you a special code via email to apply for a $1000 credit - to take your querying even farther. Please apply before Friday, so your credit can get approved on time for the workshop.

Google's Sponsorship:
Google Cloud Platform is offering all BigDataSpain attendees $2,000 of credit to get started building your web or mobile app. Apply at with the code we will give you at the conference.. This offer includes $1,000 in Google App Engine credit and $1,000 in Google Compute Engine credit. App Engine is a full development stack (PaaS), and Compute Engine lets you run workloads on Linux virtual machines (IaaS). 

About the Bitcoin dataset

The dataset is made of text files - 600+ MB compressed - 5 GB uncompressed - contains information about Bitcoin Transactions IDs from from transaction f7883 (in 2012) onwards to some point in 2013. You may find a description of the blockchain at These are the fields of a given transaction:

You can cat the information of individual transactions:

$ cat 254204.tx.csv | grep 6eea75038c52a8d114c6ac56019da791ee7026521d989ef3109284708b5bd112 whose output is Bet32kBtZzXViMs1PQHninHs4LADhCwtB;0.01;1Bet32kBtZzXViMs1PQHninHs4LADhCwtB;0000000000000028e1c6cdbc69b61cb1db11523d3389b24725cb1ffa93bfcfb3;6eea75038c52a8d114c6ac56019da791ee7026521d989ef3109284708b5bd112;2013-08-25 19:28:17;0.01

1Bet32kBtZzXViMs1PQHninHs4LADhCwtB;0.01;1Bet32kBtZzXViMs1PQHninHs4LADhCwtB;0000000000000028e1c6cdbc69b61cb1db11523d3389b24725cb1ffa93bfcfb3;6eea75038c52a8d114c6ac56019da791ee7026521d989ef3109284708b5bd112;2013-08-25 19:28:17;0.01

1Fi57hAqyYYwaQVdA7a9qSKfiukBbt31G3;0.0094454;1Fi57hAqyYYwaQVdA7a9qSKfiukBbt31G3;0000000000000028e1c6cdbc69b61cb1db11523d3389b24725cb1ffa93bfcfb3;6eea75038c52a8d114c6ac56019da791ee7026521d989ef3109284708b5bd112;2013-08-25 19:28:17;0.0094454

1Fi57hAqyYYwaQVdA7a9qSKfiukBbt31G3;0.0094454;1Fi57hAqyYYwaQVdA7a9qSKfiukBbt31G3;0000000000000028e1c6cdbc69b61cb1db11523d3389b24725cb1ffa93bfcfb3;6eea75038c52a8d114c6ac56019da791ee7026521d989ef3109284708b5bd112;2013-08-25 19:28:17;0.0094454

1Fi57hAqyYYwaQVdA7a9qSKfiukBbt31G3;0.0088168;1Fi57hAqyYYwaQVdA7a9qSKfiukBbt31G3;0000000000000028e1c6cdbc69b61cb1db11523d3389b24725cb1ffa93bfcfb3;6eea75038c52a8d114c6ac56019da791ee7026521d989ef3109284708b5bd112;2013-08-25 19:28:17;0.0088168

1Fi57hAqyYYwaQVdA7a9qSKfiukBbt31G3;0.0088168;1Fi57hAqyYYwaQVdA7a9qSKfiukBbt31G3;0000000000000028e1c6cdbc69b61cb1db11523d3389b24725cb1ffa93bfcfb3;6eea75038c52a8d114c6ac56019da791ee7026521d989ef3109284708b5bd112;2013-08-25 19:28:17;0.0088168

You see 6 records corresponding to 2 inputs y 3 outpus.. 2 x 3 = 6 as per

The fields are
- Origin address: account sourcing the money

- Destination address: account receiving the money

- Output of the previous transaction: helps calculate the balance of accounts

- Amount: 1 BTC is currently worth 1

- Date: Time when the transaction was recorded for the first time in GMT+0 with a tolerance of +/-3'

- Transaction ID: field to consolidate several senders and receivers of a transaction

- Block ID del bloque: the block where the transaction was inserted

- Block Height: ID of the relation, amount and order of the blocksThere are lots of things that can be queried on the dataset, from temporal patterns to filtering, etc.

These are some questions that can be answered with this dataset:

- Find (or predict) temporal patterns in the transactions, see world conflicts, government and economics notices and compare it with the data. Is Bitcoin a shelter investment?

- There are 1320 millions of euros as Bitcoins in circulation, but ¿which is the busiest account?

- Most people sleep from 23:00 to 08:00 and is active from 08:00 to 23:00. Which is the timezone in which there are more transactions? Which is the timezone in which there are more money amount volume in transactions?

- Are some accounts being used to make obfuscate movements? You can create patterns for detect it and estimate the volume of these transactions.