Skip to content

Instantly share code, notes, and snippets.

View avibryant's full-sized avatar

Avi Bryant avibryant

  • Galiano Island, BC
View GitHub Profile
@avibryant
avibryant / Main.java
Created July 17, 2012 21:50 — forked from ceteri/ Main.java
Cascading for the Impatient, part 3
class WordCount(args : Args) extends Job(args) {
Tsv(args("input"), ('doc_id, 'text))
.flatMapTo('text -> 'token){line : String => line.split("[ \\[\\]\\(\\),.]")}
.map('token -> 'token){token : String => token.trim.toLowerCase}
.filter('token){token : String => token.length > 0}
.groupBy('token){g => g.size}
.write(Tsv(args("output")))
}
class File
def seek_to(str)
until eof?
start = pos
buf = read(10000)
if(offset = buf.index(str))
seek(start + offset + str.size)
return true
else
seek(start + 5000)