Huafeng Xie, Physics
Faculty Advisor: Brian Schwartz
- CiteRank, a network based ranking method for citation networks, in preparation.
- Finding scientific gems with Google, P. Chen, H. Xie, S. Maslov, S. Redner, April 2006, physics/0604130.
- Optimal ranking in networks with community structure Huafeng Xie, Koon-Kiu Yan, Sergei Maslov, Accepted by Physica A, May 2006
- Effects of community structure on search and ranking in information network, H. Xie, K.-K.Yan, and S. Maslov, proceedings of the winter school “Dynamics of Complex Interconnected Systems: Networks and Bioprocesses, Geilo 2005” published by Kluwer Academic Publishers, Dordrecht (2006).
“Google unearths physics gems” An article about our work on citation networks was featured on Physicsweb.org on 04/21/2006
“Google could be a good way of measuring the “impact” of a particular scientific paper and might even be used to replace traditional citation indices, according to a new statistical analysis by physicists in the US. The researchers have found that the Google PageRank algorithm, which measures the relative importance of Web pages, can provide a systematic way to find important papers. The technique also uncovers scientific “gems” — top papers overlooked by conventional searches (physics/0604130). “
Networks are the backbone of complex systems that consists of many components. Understanding the structure of the networks would greatly help us to understand the way the components interact with each other and the intrinsic properties of the complex system. Networks of interest vary from the Internet to the World Wide Web to scientific collaborations networks.
There are basically three kinds of complex networks in terms of the geographical properties of the links: undirected, directed and weighted networks. For example, the Internet is a complex network of routers and computers with physical links. The links (network cables, fiber cables, phone lines, etc) have no direction, meaning information can go through the link either way. The World Wide Web is a typical directed network. The nodes of the network are the web pages and the edges are the hyper links from one page to anther. These links share a common character that they have no length. For example, once a hyper link from web page A to web page B is created, a directed link from node A to B is added in the network model. There is a third kind of network whose links have “length” or “weight”.
While M. Newman proposed a weighted model in scientific collaboration networks, the models for weighted networks actually haven’t been explored much. A further class of networks, bipartite networks, have two different kinds of nodes. For example the actor and movie networks where an actor is considered connected to a movie if he/she is cast in the movie. Then in the network we have movies and actors as nodes, who can connect each other but not movie-movie or actor-actor.
We are currently working on two topics: gradient networks and ranking of web pages. Gradient networks use some concepts from the fitness model and are analogous to a hierarchical-religion systems. The idea is that we start with a random relationship network, in which every node represents a person and a link is connected to other persons he is acquainted with. Then an authority coefficient is assigned to every node. Every person checks the acquaintances and makes choice of following the one (or the two persons) with the highest authority coefficient, forming directed links from the person to the leaders/mentors.
We propose study the statistical properties of this leadership network. We anticipate that this model can be used to explain the evolution of some directed scale-free networks.
The other topic we are working on is ranking of web pages. There’s no doubt that Google is the most successful search engine on the World Wide Web nowadays. Thousands and thousands of people use it every day and it has tremendous influence. The great success of Google is to a large extend due to their innovative page ranking algorithm PageRank. Actually this PageRank utilizes the network concept to obtain the rank of the pages, which decides the order of displaying the search results. This is a vivid example of the power of understanding the principles of networks. The definition of PageRank can be understood as a random walk on a graph. A large amount of random walkers keep walking along links on the www network. All the walkers have a small probability E that they will jump to some random page which is not connected with the current open page. The probability E is used to account for the behavior that when people surf online, they periodically “get bored” and jump to a random page. The rank of a page is proportional to the standing probability distribution of the random walkers on the nodes. We are studying the robustness of this ranking system, exploring the possibility that the rank of pages can be changed by a significant amount by only adding or removing a small amount of links between the pages.