Dr. Mark Humphrys

School of Computing. Dublin City University.

Home      Blog      Teaching      Research      Contact

Search:

CA249      CA318      CA425      CA651

w2mind.computing.dcu.ie      w2mind.org

Missing
DCU student

CASE3 student Paul Bunbury is missing since Thur 2 Feb 2012.
See appeals on crime.ie and garda.ie and facebook.

He is a great coder. See DCU page and boards.ie page.
He won major coding contests in 2010 and 2011.
He is author of the brilliant "FloodItWorld".
DCU can confirm that in Jan 2012 he passed all 6 modules comfortably.


Network programming in Java


Package:
import java.net.*;
Can use these classes to (a) communicate with any server, (b) construct your own server.


Java network programming reference



Example - Find IP address

Find numeric (IP) host address given text address.

InetAddress.getByName(hostname)

From Graba:

 

import java.net.*;
import java.io.*;

public class ip 
{
  public static void main ( String[] args ) throws IOException 
  {
    String hostname = args[0];

    try 
    {
      InetAddress ipaddress = InetAddress.getByName(hostname);
      System.out.println("IP address: " + ipaddress.getHostAddress());
    }
    catch ( UnknownHostException e )
    {
      System.out.println("Could not find IP address for: " + hostname);
    }
  }
}

Run it:


$ javac ip.java

$ java ip www.computing.dcu.ie 
IP address: 136.206.11.240

Q. Write program to find text given numeric.

See DNS lookup.



getLocalHost

To find your own numeric IP address in Java:
  1. getLocalHost
    Works on DCU Win.
    Works on DCU Solaris.
    Doesn't seem to work on DCU Linux. Can anyone tell me why?


Other ways to find your IP address:

  1. On DCU Linux (lab machines):
    $ ip addr list eth0
    
  2. On DCU Linux (ssh student.computing.dcu.ie):
    $ hostname -f
    shows we are in dcu.ie
    
    $ cat /etc/resolv.conf
    shows my DNS server is in 136.206
    
  3. On Windows:
    $ ipconfig
    
  4. Click a remote site. e.g. My PHP pages to give:


My local host is not to be confused with: 127.0.0.1



TCP Sockets

Connection-oriented.
Must explicitly socket.close()

Example - Query open ports

Port scanner - look at some machines in DCU to find ports that are "open" - providing a service.

Does this by trying to open a socket to that port.

 

import java.net.*;
import java.io.*;

public class ports 
{
  public static void main ( String[] args ) throws IOException 
  {
    String hostname = args[0];

    Socket s = null;

    try 
    {
      // this is to see if host exists:
      InetAddress ipaddress = InetAddress.getByName(hostname);

//	int p =  21;		// ftp
//	int p =  22;		// ssh / sftp
//	int p =  23;		// telnet
//	int p =  25;		// smtp
	int p =  80;		// http
//	int p = 110;		// pop3
//	int p = 143;		// imap

		try
		{
		  s = new Socket(hostname, p);
		  System.out.println("A server is running on port " + p + ".");
		  s.close();
		}
		catch (IOException e)
		{
		  System.out.println("No server on port " + p + ".");
		}
    }
    catch ( UnknownHostException e )
    {
      System.out.println("Could not find host: " + hostname);
    }

	if (s != null)
	{
		try
		{
			s.close();
		}
		catch ( IOException ioEx )
		{
		}
	}
  }
}

Can now look for http servers:


$ java ports www.dcu.ie
A server is running on port 80.

$ java ports dgrayweb.computing.dcu.ie
A server is running on port 80.

$ java ports mailhost.computing.dcu.ie
A server is running on port 80.

POP3 servers:


$ java ports mailhost.computing.dcu.ie
A server is running on port 110.

Search for IMAP servers.

Search for ssh servers from outside DCU for:

  1. student.computing.dcu.ie

Caution when scanning ports: Some sites don't like this.
Scanning lots of ports looks like hostile intent.
If firewall blocks a port, program will wait until timeout - could take a while.



Download HTTP page

From The Java Developers Almanac:

 

// download text content of URL

import java.net.*;
import java.io.*;

public class jget 
{
  public static void main ( String[] args ) throws IOException 
  {
    try 
    {
        URL url = new URL( args[0] );
    
        BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
        String str;

        while ((str = in.readLine()) != null) 
        {
          System.out.println(str);
        }

        in.close();
    } 
    catch (MalformedURLException e) {} 
    catch (IOException e) {}
  }
}

e.g. Get my latest password for how to email me:

  $ java jget "http://computing.dcu.ie/~humphrys/howtomailme.html"


Q. Download to file.

Q. Parse to extract password.

Q. Insert error statements into the 2 catch sections.
Note where the following are caught:

  1. Bad URL syntax.
  2. Host does not exist.
  3. Host exists but URL not found.


Get HTTP headers

From The Java Developers Almanac:

 

// get the HTTP headers 

import java.net.*;
import java.io.*;

public class jhttp
{
  public static void main ( String[] args ) throws IOException 
  {
    try 
    {    
      URL url = new URL( args[0] );

      URLConnection c = url.openConnection();
    
      for (int i=0; ; i++) 
      {
            String name = c.getHeaderFieldKey(i);
            String value = c.getHeaderField(i);
    
            if (name == null && value == null)     // end of headers
            {
              break;         
            }

            if (name == null)     // first line of headers
            {
              System.out.println("Server HTTP version, Response code:");
              System.out.println(value);
              System.out.print("\n");
            }
            else
            {
              System.out.println(name + "=" + value);
            }
      }
    } 
    catch (Exception e) {}
  }
}

Output:


Server HTTP version, Response code:
HTTP/1.1 200 OK

Date=Mon, 22 Nov 2004 11:43:09 GMT
Server=Apache/2.0.47 (Unix) PHP/5.0.2
Last-Modified=Thu, 18 Nov 2004 10:32:20 GMT
ETag="19495e-3cd-e7abf500"
Accept-Ranges=bytes
Content-Length=973
Keep-Alive=timeout=15, max=100
Connection=Keep-Alive
Content-Type=text/html; charset=ISO-8859-1



HTTP headers.
Request - sent by client.
Response - returned by server.


Page not found

If the file is not found you will normally get 404, though there are some other possibilities:

http://computing.dcu.ie/BADPAGE will give something like:


Server HTTP version, Response code:
HTTP/1.1 404 Not Found

Date=Mon, 22 Nov 2004 12:15:27 GMT
Server=Apache/2.0.47 (Unix) PHP/5.0.2
Content-Length=318
Keep-Alive=timeout=15, max=100
Connection=Keep-Alive
Content-Type=text/html; charset=iso-8859-1

Q. Write a program to check if a URL exists and return yes/no.



HTTP response codes.


My 404 re-direct

Note that I catch all errors on my site with a re-direct to a script, and you get 200 for all requests, good or bad, on my site, for reasons explained at that link.

http://computing.dcu.ie/~humphrys/BADPAGE will give something like:


Server HTTP version, Response code:
HTTP/1.1 200

Date=Mon, 22 Nov 2004 12:12:20 GMT
Server=Apache/2.0.47 (Unix) PHP/5.0.2
Keep-Alive=timeout=15, max=100
Connection=Keep-Alive
Transfer-Encoding=chunked
Content-Type=text/html; charset=ISO-8859-1
content-length=1270



The "URL" class hides the sockets

You may see that the URL class actually hides the socket that is being created underneath.

Here is opening a socket directly to send a HTTP GET command and read the results:

 

// HTTP GET through socket, not through "URL" class

import java.net.*;
import java.io.*;

public class sget 
{
  public static void main ( String[] args ) throws IOException 
  {
    Socket s = null;

    try 
    {
	String host = "computing.dcu.ie";
	String file = "/~humphrys/howtomailme.html";
	int port = 80;
    
	s = new Socket(host, port);

	OutputStream out = s.getOutputStream();
	PrintWriter outw = new PrintWriter(out, false);
	outw.print("GET " + file + " HTTP/1.0\r\n");
	outw.print("Accept: text/plain, text/html, text/*\r\n");
	outw.print("\r\n");
	outw.flush();

	InputStream in = s.getInputStream();
	InputStreamReader inr = new InputStreamReader(in);
	BufferedReader br = new BufferedReader(inr);
	String line;
	while ((line = br.readLine()) != null) 
	{
		System.out.println(line);
	}
	// br.close();		// Q. Do I need this?
    } 
    catch (UnknownHostException e) {} 
    catch (IOException e) {}

	if (s != null)
	{
		try
		{
			s.close();
		}
		catch ( IOException ioEx ) {}
	}
  }
}

From:

flush() - send this now.
TCP sends a variable number of bytes. It may buffer bytes (to collect a larger amount) before sending.
flush() tells it to send what it has now.

Output:

$ java sget
HTTP/1.1 200 OK
Date: Mon, 22 Nov 2004 15:14:10 GMT
Server: Apache/2.0.47 (Unix) PHP/5.0.2
Last-Modified: Thu, 18 Nov 2004 10:32:20 GMT
ETag: "19495e-3cd-e7abf500"
Accept-Ranges: bytes
Content-Length: 973
Connection: close
Content-Type: text/html; charset=ISO-8859-1


(the URL content)



HTTP methods.
HEAD - can be used to test a URL existence without downloading.


Sending a HTTP POST request

e.g. Sending multiple lines of data through a HTML Form.


telnet to HTTP

All plain text commands. Can just telnet to port 80 and send http commands:

$ telnet www.computing.dcu.ie 80
GET /index.html HTTP/1.1
Host: www.computing.dcu.ie

(blank line to end header)



Write your own client to control ftp, telnet, POP3 ..

We have seen how to write your own http client, using the URL class, or using sockets directly.
Now your program can control http.

You can study the commands of any other service and write a client for that too.
Use a socket to connect to the port and then send the appropriate commands.



Sites that restrict scripts

Some sites don't provide content to scripts, only to browsers. For example:

  1. Write Java program to download the Google home page. This is ok.
  2. Write Java program to download the result of a Google search. It will be blocked.

Solutions:

  1. Set user agent to pretend to be a browser

    This is a bit cheeky, but should be ok if you don't hit the site too often. That is, the remote site is asking you not to hit them with a script many times. They won't mind the occasional scripted hit. But respect their wishes by making sure you don't hit them many times or they may block your IP address.

  2. Google Code - The correct way to interact with Google via script.

YouTube searches can be scripted.



How to set user agent to pretend to be a browser

How to set User agent to pretend to be a browser:

On Windows:

$ java  "-Dhttp.agent=Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"   prog

On Linux:

$ java  -Dhttp.agent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"   prog





Feeds      HumphrysFamilyTree.com

Bookmark and Share           On Internet since 1987.