More and almost last highlights....
The O'Reilly P2P Conference -- San Francisco, CA. Feb. 14 - 16, 2001
On this one:
- the post-spider world
- How Gnutella happened
- characterizing p2p infrastructures
- Larry Lessig
- What we are learning from Gnutella
- accountability and performance
2/15/01:
The post-spider world
Searching in p2p systems: if you want decentralization, how will you
search through the p2p network to find something important? Metadata!!
- A distributed Google: Doogle.
- How will anyone make sense of stuff "tagged with incorrect metadata" and
not just that, currently, your regular user doesn't even bother to tag
their files?
DW: These are problems we've been addressing at USU. At least in the
educational domain, the creation of metadata is highly automatable. Our
estimate is that of approximately 15 metadata fields ("Author" and
"Publication Date" being two examples) all but two can be automatically
completed in a meaningful manner. As for whether or not a distributed
Google is desirable, see my comments above. And no one can make sense of
bad metadata. Garbage in, Garbage out.
SP: Speaking of p2p searching apps
http://news.cnet.com/news/0-1005-200-4950537.html
******************************************************************
How Gnutella happened - Gene Kan
- Gnutella IS NOT another Napster (also, not to replace Napster)
- Some recent problems with Gnutella:
- Ping flooding, query flooding, slow hosts acting as hubs, bugs (TTL,
broadcast push requests), scaling, freeloading.
- Gnutella initially developed for small communities
- Scalability.
- Commercial efforts to the rescue:
- clip2.com (reliable host finder gnutellahosts.com)
- limeWire, CXC and BearShare: release high quality client software
- clip2.com creates Reflector a mininapster brokerage burns Gnutella
into a hybrid centralized-decentralized model
- Has Gnutella been successful: YES! (according to Kan)
Future challenges:
- scaling
- maintaining benevolent commercial involvement
- expansion into non-file sharing applications
******************************************************************
Characterizing P2P Infrastructure - Wes Felter
- A pretty good overview on the different p2p technical designs that are
out there (for apps. like Napster, Freenet, Gnutella and Mojo Nation). An
overview on the design features that are common to many p2p systems with
pros and cons. Discussion on naming, routing, messaging and searching for
some p2p systems as well as some points on the issue of interoperability
vs. standardization.
- The presentation is available at
http://www.cs.utexas.edu/users/wesf/P2PInfrastructure.html
******************************************************************
2/16/01:
Keynote: Lawrence Lessig
If you didn't read my last post on him, go read it! :-)
I will also encourage you to "hear" his speech, or better yet, read his
book "Code and Other Laws of Cyberspace".
The actual speech is also available (in real) at
http://www.technetcast.com/tnc_catalog.html?item_id=1171
******************************************************************
What we are learning from Gnutella (Kelly Truelove) - Clip2.com
Brief history of Gnutella and the technology behind it:
- Project not opensource originally; reversed engineered.
Gnutella's recent meltdown:
- Gnutella was originally designed for a network of 10(2) - 10(3) peers
- Pings and queries are broadcast and used to discover hosts and
files; other message types, including responses, are routed. (messages are
dropped after some predefined number of relays).
- Dialup users cannot keep up with bandwidth requirements
- Bandwidth!
- Scalability issues.
Ways out:
- Introduced the Reflector (could not change the code in the peers; only
user behavior) Released October 4. Reflector is a proxy and index server
(a la Napster). Reflector also acting as a super node/peer (proxying the
nodes connected to it). Reflector also reducing the bandwidth
- here come the second generation of peers:
- LimeWire: Nov. 1
- Bearshare: Dec. 4
Numerous performance-improving measures:
- Connection-management rule: next generation of Gnutella will have
better content management rules and to prevent bandwidth hogging, quiet,
unresponsive hosts will be dropped from the network.
- Self-organization results
- Creation of "super-peers" - a peer in the Gnutella network that
decreases the amount of bandwidth used across the network due to different
connection speeds (e.g., modem vs. broadband users). See notes on
Reflector.
- Scalability: for Gnutella it means "keep the subset of the network you
can see good enough." Keep scaling 'good enough' to meet expectations".
- So a big challenge for Gnutella developers is to make the scalability
good enough to meet user expectations.
Some interesting figures from presentation:
Number of Unique [Gnutella] users: (approximations)
10/1 - 20,000
11/12 - [scour shutdown] 30,000
12/10 - [updates to gnutella.wego.com] 20,000
2/4/01 - [Napster's ruling] 180,000
- experiencing 30% daily growth
- Is there a limit?
From 10(5) to 10(6+) ... ?
******************************************************************
Accountability and performance
Freenet: caching network
Gnutella: data is only stored on the publisher's only computer
Publius: limits to 100k
Metrics
- Mojo nation: micropayments
- free haven: reputation system - reliable space
- Mixmaster: statistics pages track uptime
Accountability hard b/c:
- tragedy of commons
- p2p discourages permanent public identification
- hard to asses peer's history or to predict future performance
- legal contracts obsolete
DW: In the educational domain, these problems are also easy to solve. An
anonymous educational resources sharing mechanism would be nothing more
than a haven of plagiarism (note this is not about violating someone
else's "IP rights", it is about representing your own work as your own,
and someone else's as someone else's -- this is an issue for students,
teachers, and researchers alike). Therefore, our educational
resource-sharing p2p system will require real world id's from everyone
contributing. Resource downloading can be semi-anonymous, but original
sharers must identify themselves in order to facilitate appropriate
citation later on. You'd be amazed how many problems disappear when
anonymity and censorship-proofing (we supposedly have academic freedom,
right?) are not the primary design criteria.
Problems:
- intentional attacks
- user attacks (DoS), storage flooding, computational overload
- server attacks - low quality service (e.g. dropping data
current problems
Freenet: bandwidth overuse, cache flushing (data flooding)
Gnutella (vulnerable to query flooding, freeloading)
Publius:
Mojo: how to set prices, performance tracking not reputation
free haven: very vulnerable to query flooding, protected to query flooding
Mixmaster: no verifiability
---------------------
|