Main | May 2006 »

April 24, 2006

Data updated through 4/24/2006

That's right! New data is here.

Watch those prices go up! 

April 16, 2006

Data Updated through April 16, 2006

The new data for this past week has been processed.

April 15, 2006

Zip Code, City & Layout

The zip code field is now user entered and will move the map to that area.

The city drop down will also move the map centered on the city. I still need to add this to the county field.

The main page has a much better layout. (I’m still on the search for a nice looking template to make it look better.) And I finally figured out some div layout issues I was struggling with.

I’ve been looking for a decent date picker to replace the drop downs for date. But I haven’t found anything yet. That's the next thing to change once I find one.

April 10, 2006

Updated Data through 4/9/2006

Data is updated through April 9th, 2006. The next available data will be this next Sunday, April 23, 2006.

April 09, 2006

New Features, Upgrade Google Maps to v2, Data 99% Processed

Today Late last night I finished the first pass of all the data. The data gets ‘released’ each Sunday, so there will be more data today.

Per d$’s request there’s a new feature. When you put your mouse over the address, it’ll change the color to green if it’s in the viewable map, or red if it’s not.

Now it’s possible to be more specific with the range of prices to search on. The price range has been broken out of one restricting drop down to two now. (Another d$ request.)

There’s also a new filter for free form text. You can search on any part of the address:

  • Street number, part of the street number
  • Street name, part of the street name

I upgraded the Google Maps API to version 2 today. I started having some random problems that weren’t that consistent and seemed to be memory related (lags and such). Most likely caused from all the testing so I decided it was time to use the new. They even mentioned memory leaks on their website.

There were about 20 XML files that were in my log as being in error. (Each XML file was one date and county on the old system, now it’s getting put in a mySQL db.) I went through those and got them set up to processed.

April 07, 2006

New Data: Now there's 145,057 home sales listed

I’m still working on populating the data. There are so many data points. The real estate market around here is pretty crazy.
The data that just went in is the first pass of the following dates:

04.27.03 - 05.30.04
04.03.05 - 11.16.05

Which means we’re missing 05.30.04 - 04.03.05 and 11.16.05 – now (04.07.06)
There’s about 16 months of data to go. It should be done by tomorrow after noon.

What does ‘first pass’ mean? Well, I first geocode the address with a CGI script that I built (instructions can be found on geocoder.us). If the address isn’t found there I try geocoding using Yahoo’s Geocoding Service. However, sometimes I go over the 5000 permitted per day and it’s no longer an option until the next day.
The next pass will be using Yahoo's geocoder on all the addresses that didn't resolve on mine, and then maybe taking a look at some of the data. There was one address that I noticed that said West San Jose as it's city. I don't think that exists. So I might have to work some of that out another way.

Project Overview

Well, I guess it’s time to tell a little about the project.

Basically it all started when we were in the process of buying a house and I was dismayed at the tools available to compare and ‘see’ the homes being sold, or those already being sold. I’m a data person and what was available just didn’t work for me.

I eventually ran across SFGates Homesales which lists homes sold data in a tidy, but pretty much useless, list format. I also had been playing with the Google Maps API a bit (mapping skate parks with pictures and directions).

So, I decided to start playing with the data. It started out using Ruby to scrape the data, geocoder.us to geocode the addresses, into XML files that a Javascript front end was pulling and parsing for the map. (Ruby is the bomb, you should check it out sometime) But each XML file consisted of one weeks worth of data for one county. So looking at say a months worth of data for the entire bay area was about 36 XML files to be grabbed, parsed, filtered, and displayed. Which as you can guess was pretty slow, especially considering if you were filtering for a particular price range, you still had to load all that data.

And then I got busy with other parts of my life.

Recently, I acquired a new job. It seemed like a great place, but it was a terrible fit for me. (It had nothing to do with the people there. It was other reasons.) So I quit.

And suddenly had some time to play!

I built a personal copy of the TIGER/line data from the US census. Now I have a CGI script that will geocode fast and I’ve learned some Perl in the process. I have a bunch of PHP that took those XML files, scraped them and put them into a mySQL database which the front end uses.And it’s so much faster. (Like, dah!) I added some filters and have been toying with other parts too. Oh, and I’m using AJAX to get the data to the front end. PHP for everything back end. I fought with bringing XML to the front using AJAX but it basically sucked. And then replaced the XML with JSON and all I have to say is WOW! Stop fighting with XML and just use JSON. It’s so much easier.

Incase you came here another way. The project is mapping sold homes in the Bay Area and can be found  at http://www.BayAreaSoldHomes.com

April 06, 2006

Day One .. er .. not really

Here I am. Writing a blog. Who would have guessed it?

Today is day one of the blog, but definitely not day one of the project. I’ve been working around the clock for over a week now to get this project published. And I’m having so much freaking fun. I love this stuff. I’ve been staying up until early morning (5am on Tuesday). If I wasn’t getting so tired and making so many mistakes, I’d probably stay up for days straight.

Right now the data is complete through about August 28, 2003. I’m hoping in a couple days it’ll be done and I can go back through and figure out how to fix those addresses that didn’t Geocode okay.