The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we describe the design of a specialized highperformance web crawler that runs in a decentralized fashion. It is specialized for scraping data from New Media web sites such as blogs, Twitter, Facebook, etc. which in the past years has grown exponentially. The crawler is designed to be easily scalable, from a single node to hundreds or many more, to be resilient against crashes and other...
This paper presents a social computing tool that centers around social scientists. In the past years, we have worked with social scientists and cultural anthropologists. We learned their ways of studying subjects in social media, what their needs are, and their interests. In the process, we have built a generic platform for collecting data in the blogosphere, tracking blogs of particular interests,...
The massive adoption of social media has provided new ways for individuals to express their opinions online. The blogosphere, an inherent part of this trend, contains a vast array of information about a variety of topics. It is thus a huge think tank that creates an enormous and ever-changing archive of open source intelligence. Modeling and mining this vast pool of data to extract, exploit and describe...
Weblogs, and other forms of social media, differ from traditional Web content in many ways. One of the most important differences is the highly temporal nature of the content. Applications that leverage social media content must, to be effective, have access to this data with minimal publication/acquisition latency. An effective Weblog crawler should satisfy the following requirements: low latency,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.