Hadoop Summit
James's notes :
Hadoop: A Brief History
¡¤ Doug Cutting
¡¤ Started with
Nutch in 2002 to 2004
o Initial goal was
web-scale, crawler-based search
o Distributed by necessity
o Sort/merge based
processing
o Demonstrated on 4 nodes
over 100M web pages.
o Was operational onerous.
¡°Real¡± Web scale was a ways away yet
¡¤ 2004 through
2006: Gestation period
o GFS & MapReduce
papers published (addressed the scale problems we were having)
o Add DFS and MapReduce to
Nutch
o Two part-time developers
over two years
o Ran on 20 nodes at
Internet Archive (IA) and UW
o Much easier to program
and run
o Scaled to several 100m
web pages
¡¤ 2006 to 2008:
Childhood
o Y! hired Doug Cutting
and a dedicated team to work on it reporting to E14 (Eric Baldeschwieler)
o Hadoop project split out
of Nutch
o Hit web scale in 2008