TIM Graduate Student Research Symposium
Date: 9am-5pm, Monday, Oct. 5, 2009
Location: Room 599, Engineering 2
TIM Student Research Symposium (SsRS) is a one-day series of presentations designed to increase awareness of the diverse aspects of TIM research conducted by students supervised by professors in the Technology and Information Management Program. SRS presentations also provide students with an opportunity to share their work with other TIM members, and learn about other student research in TIM
Speakers: Knowing the audiovisual set-up will prevent difficulties with your presentation and contribute significantly to the success of the symposium. Speakers are expected to produce a PowerPoint presentation to accompany their speech, as a visual reference for the audience. A projector and a laser pointer will be provided. Please bring your own laptop for the presentation. Please come to make sure your laptop works with the projector during the break before your session starts (this is a common practice in conferences). If you don't have laptop, please email professor Zhang yiz@soe your powerpoint slides 24 hours before the symposium.
SCHEDULE
Breakfast (provided)
09:00am 09:05am Openning
09:05 Joel Barajas, Karla L Caballero and Ram Akella. Generalized Dirichlet Distribution in Topic Modeling
and Yi Zhang, Collaborative Filtering on the Epinion Data
. Semantic Web for Search
. Automatic Generation of Ontologies Through Vectorized Features
Yi Sun and John Musacchio. Market Mechanisms for Near-Real-Time Landing and Takeoff Slot Allocation
. Dantzig-Wolfe Decomposition Applied to Traffic Flow Scheduling
. Dynamic Recommender System in Citation Networks
. Ordering Innovators and Laggards for Product Categorization and Recommendation
. Model of an International Environmental Agreement among Asymmetric Nations applied to Debris Mitigation
. User Profile Initialization for Personalized Search and Filtering
. A Strategy and Framework of an Automatic Pricing Optimization System for Competitive Advantage in an E-Retailing Environment
. Systems Perspectives on Service Science
Incorporating (Multiple) Criteria into Information Retrieval
. Contextual Image Retrieval with Relevance Feedback
04:05pm - 04:30pm Shuang Wu and John Mussachio, The Effect of Incomplete Information and Collusion in Upgrade Decision Games
04:30pm - 04:55pm Yize Li and Yi Zhang. Integrating implicit feedback for recommender systems
5:00pm Conclusion and Award
ABSTRACTS
Joel Barajas, Karla L Caballero and Ram Akella. Generalized Dirichlet Distribution in Topic Modeling
Abstract. Topic decomposition techniques have been successfully used in many applications including retrieval and blog mining. Correlation between topics has been addressed before by using a logistic Normal distribution in topics which allows a covariance matrix inference. However, it turns out to be a non-conjugate prior for multinomial distribution which requires approximations and a more complicated model. In addition, it suffers from over fitting since the number of parameters in the covariance matrix grows significantly according to the number of topics. In this presentation, we expand Bayesian document models beyond the traditional LDA model. Our contributions are two-fold: 1. Use of a DCM (Dirichlet Compound Multinomial) distribution to better model the word generation probability matrix in place of the usual multinomial model for modeling documents. 2. Use of the Generalized Dirichlet distribution to model the probability of topics in each document. This distribution allows us to account for correlations between topics and it is still a conjugate prior of the multinomial distribution. For the inference task, we use Gibbs sampling and stochastic EM (Expectation Maximization) for parameter estimation. Some preliminary results are discussed using the New York Times dataset from TREC 7 and the NIPS dataset.
Jiazhong Nie and Yi Zhang. Learning Hidden Representation of Words from Multi-tasks
Abstract. In the last decade, hidden representation oriented language learning models
were developed fruitfully. The idea became well-known with the Latent
Semantic Analysis(LSA) model which is wildly applied in the information
retrieval. Based on LSA, Probabilistic LSA model integrates the hidden
representation with the probabilistic framework which not only provides
a probabilistic interpretation of the model but also makes it possible to
enhance the model with developed theories and methods of the probabilistic
models. For example, Latent Dirichlet Allocation model extends Probabilistic
LSA with a Dirichlet prior and reformulates the model with the Bayes method.
However, most of these models, including those latest developed ones, are only focusing
on learning hidden representation from one task. We extend the idea by learning a
single set of hidden representation from several different tasks. In my
presentation, I will introduce the definition of our model and our algorithm of
learning the hidden representation. Besides, I will also give a comparison of
the effects of different learning methods.
Jessica Gronski and Yi Zhang. Semantic Web for Search
Abstract. Semantic Web data seems like a promising source of information for improving search. While there is some literature about how semantic data should be used to enhance search, there are no positive conclusions about the best approach. In this talk we briefly describe semantic web data, existing approaches to using external data for search, our proposed approaches, the experimental design for evaluating solutions, and some preliminary results.
Jonathan Koren and David Buttler. Automatic Generation of Ontologies Through Vectorized Features
Abstract. Searching large document collections can difficult for users if they are not already familiar with the contents of the collection. Traditional keyword based search engines often fail to provide users with suggested refinements to their queries. One possible solution is to use structured metadata that is associated with the documents to provide refinements of the query results. Such systems are known as "faceted search interfaces."
This work uses textual analysis, and Wikipedia as substrate, to build a graph of concepts extracted from the unstructured text in document collection. By assigning vectors of features to both the nodes and edges of the graph, we identify, using machine learning techniques, concepts that are likely to be useful for users for query refinement. These "interesting" concepts, along with their incident edges, form an ontology for the collection. This generated ontology can then be used as the basis for faceted search interface for the document collection.
This work was done at Lawrence Livermore National Lab
Yi Sun and John Musacchio. Market Mechanisms for Near-Real-Time Landing and Takeoff Slot Allocation
We propose to develop market based mechanisms for allocating air craft landing and takeoff slots in near real time. Weather can greatly affect the landing capacity of a number of major airports, which can lead to severe delays. By using a market mechanism to reallocate slots, the flights that have the greatest need for arriving on or near their scheduled time can be prioritized, as operators of such flights would be willing to bid more. Our initial investigation focus on a novel idea using a bidding language based on temporal utility functions in the context of a Vickery-Clarke-Grove (VCG) mechanism. The scheme allows aircraft operators to express their utility for each of a series of slots with a single bid. VCG type mechanisms work by solving an optimization problem and our choice for the form of the temporal utility function makes the solution easy to compute, even with a large number of bids. We find that in a well constructed VCG type mechanism, truthful telling is the dominant strategy for all aircraft operators in expressing their utility, and the run way efficiency can be improved by using a simple optimization rule based on the utility function reported by aircrafts.
Joseph Rios and Kevin Ross. Dantzig-Wolfe Decomposition Applied to Traffic Flow Scheduling
Abstract. Optimal scheduling of air traffic over the entire National Airspace System is a computationally difficult task. To speed computation, Dantzig-Wolfe decomposition is applied to a known linear integer programming approach for assigning delays to flights. The optimization model is proven to have the block-angular structure necessary for Dantzig-Wolfe decomposition. The subproblems for this decomposition are solved in parallel via independent computation threads. Experimental evidence suggests that as the number of subproblems/threads increases (and their respective sizes decrease), the solution quality, convergence, and runtime improve. A demonstration of this is provided by using one flight per subproblem, which is the finest possible decomposition. This results in thousands of subproblems and associated computation threads. This massively parallel approach is compared to one with few threads and to standard (non-decomposed) approaches in terms of solution quality and runtime. Since this method generally provides a non-integral (relaxed) solution to the original optimization problem, two heuristics are developed to generate an integral solution. Dantzig-Wolfe followed by these heuristics can provide a near-optimal (sometimes optimal) solution to the original problem hundreds of times faster than standard (non-decomposed) approaches. In addition, when massive decomposition is employed, the solution is shown to be more likely integral, which obviates the need for an integerization step. These results indicate that nationwide, real-time, high fidelity, optimal traffic flow scheduling is achievable for (at least) 3 hour planning horizons.
Michael Singer and John Musacchio. Model of an International Environmental Agreement among Asymmetric Nations applied to Debris Mitigation
Abstract. We investigate how ideas from the International Environmental Agreement (IEA) literature can be applied to the problem of space debris mitigation. The problem of space debris is similar to other international environmental problems in that there is a potential for a tragedy of the commons effect--individual nations bear all the cost of their mitigation measures but share only a fraction of the benefit. Consequently, nations have a tendency to underinvest in mitigation. Coalitions of nations, brought together by IEAs, have the potential to lessen the tragedy of the commons effect by pooling the costs and benefits of mitigation. This work brings together two recent modeling advances: i) a game theoretic model for studying the potential gains from IEA cooperation between nations with asymmetric costs and benefits, ii) an orbital debris model that gives the societal cost that specific actions, such as failing to deorbit an inactive satellite, have on the environment. We combine these two models with empirical launch share data for a "proof of concept" of an IEA for a single mitigation measure, deorbiting spacecraft at the end of operational lifetime. Simulations of all possible coalitions for a proxy set of 12 asymmetric nations suggest the possibility that stable coalitions can provide significant deorbiting gains relative to nations acting in the absence of an IEA coalition.
Sarah Tyler, Yun Chi, Shenghuo Zhu and Yi Zhang. Ordering Innovators and Laggards for Product Categorization and Recommendation
Abstract. Different buyers exhibit different purchasing behaviors. Some
rush to purchase new products while others tend to be more
cautious, waiting for reviews from people they trust. In market
analysis, the former group of buyers is often referred to
as innovators and early adopters while the latter group is
referred to as laggards. The adoption behavior is a dynamic
feature of the user and varies over groups of products, e.g.,
innovators of literature may not be the innovators of electronics.
The adoption order of users is a dynamic feature
of the product, which can help to predict the future potential
buyers. However, such dynamic features are usually
unavailable in the description of products. In this paper,
we study the user behavior of an online review website|
Epinions.com. We first propose to model user adoption behaviors
by creating a total ordering among users who rate
the products in a given category. We develop a greedy algorithm
and a Markov-chain based algorithm for computing
the category total ordering. Next, we show that by using
user behavior information, we can more accurately predict
the category of a new product as well as predict which users
will follow. Furthermore, by using the Epinion.com trust
network as evidence, we demonstrate that our total ordering
can group users into communities that closely resemble
the trust network. Thus the adoption order can be a useful
feature in recommendation systems.
Karla L Caballero, Joel Barajas and Ramakrishna Akella. Dynamic Recomender System in Citation Networks
Abstract. There has been extensive research on recommender systems (in the context of social networks), particularly motivated by the Netflix challenge and other needs more recently. Current tools like Google Scholar and CiteseerX allow users to retrieve relevant papers. These documents provide a social network based on citations which can be exploited to recommend relevant papers or authors to the user. Also, we can model the communities and interests evolution for each author. In this presentation, we explore the use of recommender systems in retrieval assuming a social network from citations. For instance, if we have a navigational query (the user knows exactly which document he/she wants) we can suggest relevant papers based on the abstracts and the social network. We model communities and authors are mixtures of interests where the observations for these are the abstracts of the papers. To allow the evolution of interest, we discretize the time and model the interest as generated from previous history. Here these dynamics are assumed to be Markovian.
Lanbo Zhang and Yi Zhang. User Profile Initialization for Personalized Search and Filtering
Abstract. In personalized search and filtering, a retrieval/filtering system usually maintains one profile for each of its users. User profiles are the basis for retrieval/filtering systems to decide whether a document is interesting to a user.
In past decades, relevance feedback mechanism has been presented to help retrieval/filtering systems build user profiles. In this mechanism, users are given some documents and are requested to give feedback on whether these documents are relevant to their information needs or not. Retrieval/Filtering systems then use users' feedback to initialize user profiles.
We present a novel mechanism to help initialize user profiles, which we call faceted feedback. We view document metadata as facets: each metadata field corresponds to a facet, and values assigned to a field are values of the corresponding facet. For example, "Date" field is a facet, "2009-09-18" is a value for facet "Date", and thus "Date: 2009-09-18" is a facet-value pair. Unlike traditional relevance feedback, faceted feedback allows users to judge the relevance of facet-value pairs with their information needs. Our experimental results have proven faceted feedback is effective and has its own advantages over traditional relevance feedback mechanism.
Rany Polany and Subhas Desa. A Strategy and Framework of an Automatic Pricing Optimization System for Competitive Advantage in an E-Retailing Environment.
Abstract. Due to the large quantity of eRetailer’s on the World Wide Web, it is imperative for these businesses to implement new Management of Technology services to maintain competitive advantage in this environment. The method proposed in this paper is for an eRetailer to utilize an electronic commerce system that is self-learning about competitor pricing, and based on the real-time data retrieved from the fluctuating market, is capable of automatically updating internal product pricing.
In this paper, we develop a Strategy to augment an eRetailer’s e-Commerce system with a price gathering agent utility and a Framework for automatically optimizing product profit margin and retail pricing using the collected information. We discuss the functionality of the Framework components and how the interaction between these components manages the automatic price optimization to increase the attraction of price discerning consumers. A process is proposed for performing the price optimization analysis and computations.
In the conclusion is a discussion about possible forms of implementation, and areas that need further research towards developing the research into a Pricing Decision Support System.
Janet Singer. Systems Perspectives on Service Science
Abstract. Recent service science literature positions that emerging discipline as a specialization of systems science concerned exclusively with human-made systems. However, researchers have only just started to identify ideas from systems science relevant to the new discipline. A review of writings by Alfred North Whitehead, Herbert Simon, C. West Churchman, John Warfield, Nigel Cross, and Hiroshi Deguchi on the need for a new pragmatic systems discipline yields four key themes: 1) the new discipline should present a "third culture" of design, while drawing from the sciences and humanities; 2) the new discipline should be developed simultaneously on levels of science, meta-science, and foundations; 3) the core competences of the new discipline should be modeling and simulation; and 4) methods and models of the new discipline should integrate quantitative, qualitative, and critical systems approaches. Though the subject matter is broad and fundamentally complex, these ideas present a clear, consistent, and coherent basis for developing the new discipline.
Shawn R. Wolfe and Yi Zhang. Incorporating (Multiple) Criteria into Information Retrieval
Abstract. Understanding the user's need is a crucial aim in information retrieval. For the most part, the focus has been on capturing the content-based portion of the user's need, but other factors (such as novelty) can play a role. In this talk, we describe our exploration of using additional criteria to capture other elements of the user's need. We present our results from preliminary experiments exploring the benefit of using additional criteria, as well as the interaction criteria may have. We also present some of the current challenges to using additional criteria in information retrieval.
Xing Xing and Yi Zhang. Contextual Image Retrieval with Relevance Feedback
Abstract. Taken a word (or phrase) in a document as a query, a contextual
image retrieval system tries to return back images that match the
word (or phrase) in the given context. The most challenging problem
for a contextual image retrieval system is to annotate the query
with appropriate and meaningful images according to its context.
In order to overcome the aforementioned challenge, we first explore
the problem of query difficulty for contextual image retrieval,
which refers to a criterion of how difficult to represent the query
as images. We exploit machine learning algorithms to learn the query
difficulty prediction models based on several assumptions proposed
based on the characteristics of the query words as well as the
context. After preparing our dataset with more than ten million
images and corresponding webpages crawled from several popular image
search engines, we then use two mixture models to retrieve images
for the noun or noun phrase query in the document with special
considerations on the query context, and the weight of the mixture
models is also determined by machine learning algorithms. Finally,
in order to further improve the retrieval performance, we employ the
relevance feedback techniques to do query expansion and to reduce
the ``semantic gap'' between query sense and image content. The
system performance is evaluated and compared using the data
collected in a user study, which is carried out in a real web-based
contextual image retrieval system with several users.
Shuang Wu and John Mussachio, The Effect of Incomplete Information and Collusion in Upgrade Decision Games
We study the strategic decisions of $N$ interconnected agents deciding whether to upgrade their systems. The agents can either be Internet Service Providers who can upgrade their networks or users/organizations that decide to invest in security to reduce their future expected loss from attacks. An agent's revenue depends not only on his upgrade decision, but also on the upgrade decisions of others. An agent that chooses not to upgrade earns a free-rider benefit if others upgrade. We first show that pure Nash equilibria always exist in this upgrade decision game.
We then investigate the upgrade decision game under cost uncertainty.
We assume that each player's upgrading cost is known only to that player but it is common knowledge that the costs are drawn from the same distribution.
We first prove that a Bayesian equilibrium always exists and then show that how bad cost uncertainty can hurt players' social welfare.
We also study how the option to form coalitions can improve players' social welfare.
Yize Li and Yi Zhang. Integrating implicit feedback for recommender systems
Recommender System has achieved great success in industry application. For example, online stores, such as Amazon and Netflix, provide customized recommendations for additional products or services based on a user's history. However, the traditional recommender systems suffer from the data sparsity problem - the user-item matrix is very sparse. Most of users provide fewer or none explicit feedback, i.e. product ratings or reviews. One way to solve this problem is to turn to the implicit feedback, such as purchase history or click through information, to infer users' interest. We introduce a model to integrate implicit feedback to help improve performance.
TIM students need to submit abstracts for presentation in the Symposium. Please Click here to submit
your abstract before September 19 (Saturday) 2009.
One of the presentations will be selected for a "Best Presentation Award", which will be announced at the end of the Symposium.



