Like in most other languages R too provides some interfaces to read data and they are called connection interfaces. These connections can be made to different things. Lets try to understand few connections.
- file – make connection to a file
- url – make connection to url
- gzfile, bzfile – make connection to compressed files
File Connection
In R function we need to deal with lot of arguments. But most of the function name and arguments are very straightforward. If you want to a get clear understanding of the arguments you can checkout in the manuals and it’s documented clearly.
In my ‘Read/Write Data into ‘R’ Language’ post I was talking about some useful essential functions where we didn’t use any file connection code. Because we don’t need to deal with connections directly as many functions have implemented to run inside them. So when we are reading and writing a file we don’t need to think much about it.
> data <- read.csv(“test.txt”)
Lets look at the below code and try to understand to create a connection with the file.
> con <- file("test.txt")
> open(con, "r") ## Open connection to test.txt' in read-only mode
> data <- read.csv(con) ## Read from the connection
> close(con) ## Close the connection
URL Connection
The readLines() function is a useful once you have made the connection with the data location. Therefore, after creating the URL connection using readLines() function we will be able to read the webpages line by line.
> con <- url("http://renien.com/blog/hello-r-world/", "r") ## open the URL connection
> lines <- readLines(con) ## read the webpage line by line
> head(lines) ## print few lines
[1] "<!doctype html>"
[2] "<!--[if lt IE 7]><html class=\"no-js lt-ie9 lt-ie8 lt-ie7\" lang=\"en\"> <![endif]-->"
[3] "<!--[if (IE 7)&!(IEMobile)]><html class=\"no-js lt-ie9 lt-ie8\" lang=\"en\"><![endif]-->"
[4] "<!--[if (IE 8)&!(IEMobile)]><html class=\"no-js lt-ie9\" lang=\"en\"><![endif]-->"
[5] "<!--[if gt IE 8]><!--> <html class=\"no-js\" lang=\"en\"><!--<![endif]-->"
[6] "<head>"
Connection to Compressed
To open .gz file we can use gzfile() interface.
> con <- gzfile(test.gz) ## open connection to compressed file
Once we open the compressed file using the connection interface we can use readLines() function to read the content line by line from the text files.
> lines <- readLines(con,2)
[1] Renien
[2] Joseph