Historical data stream provides fragmented data

Home Forums Data Historical data stream provides fragmented data

This topic contains 6 replies, has 4 voices, and was last updated by  jamesinealing 4 years, 1 month ago.

Viewing 7 posts - 1 through 7 (of 7 total)
  • Author
    Posts
  • #75032

    mcg05
    Participant

    Hi everyone,

    Last week, I signed up to the TFL website to access the historical “Barclays Cycle Hire statistics feed” and I immediately received an email with a link to the feed.

    The link allowed me to download a 10.6mb zip file (~40mb of CVS). However, instead of the advertised

    “31 May 2011 to 4 February 2012″ data the zip file contained 4 files worth of data for the following dates: 01/02/12-06/02/12, 29/02/12-02/03/12, 01/04/12-15/04/12 and 11/05/12-24/05/12. Each file only contained 65536 entries, which seems rather low given the weekly stats shown on (http://www.tfl.gov.uk/roadusers/cycling/20389.aspx)

    Has anyone else experienced this? Also does anyone know where/how to get the actual “31 May 2011 to 4 February 2012″ data containing all journeys?

    Cheers

    Chris

    #85483

    Bimbles
    Participant

    65536? Isn’t that the old Excel row limit?

    Could say something about how they extracted the data…

    #85484

    mcg05
    Participant

    That could be the case, indeed. To be sure it wasn’t my OpenOffice causing the issue I checked the files in VIM and again I only got 65536 entries in each of the 4 files.

    I have contacted TFL in this matter.

    In the mean time – would anyone mind sharing their “31 May 2011 to 4 February 2012″ data set with me?

    #85485

    Matt
    Participant

    I don’t think I have a copy I’m afraid. If you need more recent data (the last few weeks) let me know and I’ll export it from my database.

    It has got to 2.5m rows now though…!

    #85486

    mcg05
    Participant

    @blighter. Thank you for the kind offer. I was just about to enquire as to how we could best exchange the data when I got the following email from TfL:

    Code:
    Hi Chris

    Thanks for your email.

    The increasing popularity of the scheme and the size of the data means we can not back date as far as we used to, we will go back as far as we can an make sure we have a complete dataset and tat the date ranges are correctly specified.

    This work will take up to 2 weeks to complete.

    Thanks

    TfL Online

    From: Guenther, Chris
    To: Developers
    Subject: Missing cycle hire data

    Hi,

    I registered for the following data stream:

    “Barclays Cycle Hire statistics” – Details of all Barclays Cycle Hire journeys made from 31 May 2011 to 4 February 2012

    and I subsequently received an email with the following link:

    However after I download the file I realised that it only contains 4 CSV files with 65536 hire entries
    each (dates: 01/02/12-06/02/12, 29/02/12-02/03/12, 01/04/12-15/04/12 and 11/05/12-24/05/12).

    Is it still possible to get the data of all the journeys made “from 31 May 2011 to 4 February 2012″
    or a similar more recent data set say “from 4 February 2012 until 1 June 2012″?

    Kind regards
    Chris

    I will try it again in two weeks time, but I might come back to your offer if it doesn’t work.

    Cheers

    Chris

    #85487

    jamesinealing
    Participant

    Replying to @mcg05‘s post:

    Can I just ask, I signed up for the stats but the latest data is 25/7 and the API page says n/a against update frequency. Does anyone know in practise how frequently they update it, and what the typical lag is in the data they provide?

    Thanks

    #85488

    jamesinealing
    Participant

    This response from TFL:

    “They update every 3-6 months, we are just waiting for the data to the end of October to be ready – it should be up by the end of the month. Please note we only release 6 months of data, so if you require historical data please ensure you save the current dataset, as Feb-Apr data will soon be removed.”

    I’m seeking clarification exactly what they mean by this!

    On the original point, the 6 files now look to have a complete date range, though 29/02-03/03 appear to be duplicate across parts 1 & 2, and there are some very odd records with dates of 1900 and 1901 in I think part 4!

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic.