Currently, the World Wide Web (or the Web for short) is a huge information
source. Before the Web, finding information means asking other person or
looking for it in some books or other kinds of text document. Now, if we need
information about something, we can just open a web browser and search it in
web search engine like Google. The Web is also a popular communication media.
People interact with each other via web forum or social network web site like
Facebook and Twitter. Finally, the Web is also an important channel for
conducting business. Many companies have used the Web for product campaign or
to open online store.
Because of those important uses of the Web, many researches have been
conducted to extract useful information from the Web. According to Liu (2007),
web mining aim to discover useful information or knowledge from the web
hyperlink structure, page content, and usage data. Based on those
primary kinds of data used in the mining process, web mining tasks can be
categorized into three types: web structure mining, web content mining and web
usage mining.
Web Structure Mining
Web structure mining aims to discover useful knowledge from hyperlinks,
which represent the structure of the Web. Hyperlink is a link that exists in a
web page and refer to another region in the same web page or another web page.
The most popular application of web structure mining is to calculate the
importance of web pages. This kind of application is used in Google search
engine to order its search results. A web structure mining algorithm, PageRank,
is invented by Google founders: Larry Page and Sergey Brin. Web structure
mining can also be applied to cluster or classify web pages (Gomes and Gong,
2005).
Web Content Mining
Web content mining extracts or mines useful information or knowledges from
web page contents. There are two categories of web content mining: structured
data extraction and text mining. The idea of structured data extraction is that
many web site display important information retrieved from their database using
some fixed templates. We can identify those templates by finding repeated
patterns in web pages. Apart from structured data, the Web also contain a huge
amount of unstructured text, written in natural language. One of the common
tasks in text mining is to extract people's opinions or sentiments expressed in
product reviews, forum reviews, social networks and blogs.
Web Usage Mining
Web usage mining aims to capture and model behavioral patterns and profiles
of users who interact with a web site. Such patterns can be used to better
understand the behaviors of different user segments, to improve the
organization and structure of the site, and to create personalized experiences
for users by providing dynamic recommendations of products and services. Unlike
two previous web mining tasks, the primary data source for web usage mining is
web server access log, not the web pages.
References
Gomes, M. and Gong, Z., 2005, Web Structure Mining: An
Introduction, Proceedings of the 2005 IEEE International Conference on
Information Acquisition
Liu, B., 2007, Web Data Mining: Exploring Hyperlinks, Contents, and
Usage Data, Springer