Developers, Developers, Developers! Maksim Sorokin IT Blog

18Apr/11Off

Yahoo Query Language — Finding Geographic Coordinates of ZIP Codes in Denmark

This post will describe a simply way to get geographic coordinates (latitude and longitude) for ZIP codes in Denmark using Yahoo Query Language (YQL).

The motivation is quite simple. One may need an estimate of the distance between two points on the map using ZIP codes. This may be because you do not have any real addresses. Or just you are totally fine with just a direct distance between two ZIP codes areas.
One could go several ways. Hire a lady to query all possible into Google Maps and get coordinates from there. Or try getting some information from the post office. Or just try finding coordinates on the Internet. But we will go slightly different way using the power of Yahoo Query Language (YQL).

First, let's have some tests. We will query Google Maps for a zip code and compare coordinates with Yahoo Query Language (YQL) results.
Let's test with Hellerup. We go to Google Maps and query for "2900, Denmark".

Now we go to YQL Console and use "get san francisco geo data" example from the right. And change the query to the following select centroid from geo.places where text="2900, Denmark". After querying, we check centroid tag from XML output:

<centroid>
	<latitude>55.736462</latitude>
	<longitude>12.561010</longitude>
</centroid>

Then we use these coordinates in Google Maps to check the location.

One may also try querying select centroid from geo.places where text="2900, Hellerup Denmark" (Hellerup was added to the query), which would result in a little North-East location. But it actually does not really matter. First of all, 500-1000 meters error is tolerable. The second thing, both results can be right since the real ZIP code center may not exist.

We will use YQL the rest query. So for instance Hellerup ZIP code query url would look like:


http://query.yahooapis.com/v1/public/yql?q=select%20centroid%20from%20geo.places%20where%20text%3D%222900%2C%20Denmark%22

The HTTP response to this query is the following:

<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng"
	yahoo:count="1" yahoo:created="2011-04-18T07:34:03Z" yahoo:lang="en-US">
	<results>
		<place xmlns="http://where.yahooapis.com/v1/schema.rng">
			<centroid>
				<latitude>55.650162</latitude>
				<longitude>12.534110</longitude>
			</centroid>
		</place>
	</results>
</query>
<!-- total: 124 -->
<!-- engine5.yql.ird.yahoo.com -->

Next what we need are Denmark's ZIP codes. One may find them somwhere on the internet.
We write a simple Java program which would use the YQL rest query with all ZIP codes.

In our case we have 580 ZIP codes, so for easier maintenance we move them to a separate class.

package dk.sorokin.maksim.zipCodes;

public class ZipCodes {

  static int ZIP_CODES[] = { 1000, 1600, 2000, 2100, 2200, ...};
}

And then querying class. YQL has some limits and restrictions. Since we are not using any authentication, we are not allowed to make queries rapidly. Therefore, in current case we limit querying to 1 time per 4 seconds:

package dk.sorokin.maksim.zipCodes;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URI;
import java.net.URL;

public class App {

  private final static String YQL_ZIP_QUERY_SCHEME = "http";
  private final static String YQL_ZIP_QUERY_HOST = "query.yahooapis.com";
  private final static String YQL_ZIP_QUERY_PATH = "/v1/public/yql";
  private final static String YQL_ZIP_QUERY_QUERY = "q=select centroid from geo.places where text=\"%d, Denmark\"";

  private final static String CENTROID_TAG_OPEN = "<centroid>";
  private final static String LATITUDE_TAG_OPEN = "<latitude>";
  private final static String LATITUDE_TAG_CLOSE = "</latitude>";
  private final static String LONGITUDE_TAG_OPEN = "<longitude>";
  private final static String LONGITUDE_TAG_CLOSE = "</longitude>";

  public static void main(String[] args) throws Exception {
    for (int zipCode : ZipCodes.ZIP_CODES) {
      Thread.sleep(4000);
      URL url = buildYQLURL(zipCode);
      // System.out.println(url);

      BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
      String str;
      while ((str = in.readLine()) != null) {
        tryParsingCentroid(str);
      }
      System.out.println(zipCode + ";");
      in.close();
    }
  }

  private static URL buildYQLURL(int zipCode) throws Exception {
    return new URI(
        YQL_ZIP_QUERY_SCHEME,
        YQL_ZIP_QUERY_HOST,
        YQL_ZIP_QUERY_PATH,
        String.format(YQL_ZIP_QUERY_QUERY, zipCode), null).toURL();
  }

  private static void tryParsingCentroid(String s) {
    if (s.contains(CENTROID_TAG_OPEN)) {
      System.out.print(
          s.substring(
              s.indexOf(LATITUDE_TAG_OPEN) + LATITUDE_TAG_OPEN.length(),
              s.indexOf(LATITUDE_TAG_CLOSE)));
      System.out.print(",");
      System.out.print(
          s.substring(
              s.indexOf(LONGITUDE_TAG_OPEN) + LONGITUDE_TAG_OPEN.length(),
              s.indexOf(LONGITUDE_TAG_CLOSE)));
      System.out.print(",");
    }
  }
}

And our output is the following:

55.676239,12.567470,1000;
55.680271,12.561970,1600;
55.681770,12.516840,2000;
55.706051,12.580990,2100;
55.697071,12.548240,2200;
55.651428,12.582240,2300;
...

What we can do with it? Import to a database as CSV or using regular expressions generate SQL insert statements.

Comments (0) Trackbacks (1)

Leave a comment