Web data extraction
If your organization wants to design and develop comprehensive information system
the first challenge comes to you is extraction of data from World Wide Web.
Issues
that arise include extraction, validation and management of the large...
More
Web data extraction
If your organization wants to design and develop comprehensive information system
the first challenge comes to you is extraction of data from World Wide Web.
Issues
that arise include extraction, validation and management of the large amount of data
available on the internet.
These data have typically a low quality, format mismatch
and content mistakes making things more difficult.
Most popular algorithm in practice for effective Web Data extraction is Regular
Expressions or Wrapper.
This algorithm offers flexible and scalable mechanisms to
harvest necessary data from various web resources such as directories, forums,
blogs, etc.
Since all these web sources are quite assorted its nearly impossible to
build and maintain huge database for business intelligence and market research
purpose.
Wrappers are dedicated applications that automatically harvest data from online
documents and store the information into a specified structured format.
The wrapper
application
Less