I've been hearing from multiple sources that Yahoo Pipes was a nice way to build not too complex data aggregation and manipulation tasks, so this weekend I 've spent some hours at night playing with it, doing a small proof of concept and grasping how useful it is. And I like it :)
The service is free, although as expected rate-limited, and you don't need any IDE, the website has a nice web editor and debugger.
The official documentation has lots of examples and most details you will need, although some module examples are broken. In practice, the approach that worked best for me was to just browse the catalog of published pipes to see examples of how to do something, then practice. There are some introductory videos too, visual and easy to follow.
It reminds me to how Lego Mindstorms "programming" was done: By joining small building blocks. You pick the block that fetches data, then other that loops over the fetched items and fetches RSS entries, then other to do some renamings, a bit of regex love if you want, and then sorting and truncating to not return hundreds of results.
Here is how looks the first pipe built, an OPML to aggregated RSS pipe (published and available from
Debugging is quite easy as the lower part of the editor has a "debug output" frame, that shows you the output of the currently selected module, really really handy. Sometimes needs a manual refresh or two but will be your "eyes" inside the pipe until you're finished building it.
Once built, you can try it from the web and see how looks in a sample table, or choose other output formats: Webservice, RSS, JSON... clearly built to be used as a service as you can see. If you have inputs, you just specify them in the request as querystring parameters and you're good to go.
The only issue I found while using it was that Yahoo Pipes seems to use types internally and when you "fetch data" and just output it, decides to not allow you to do everything that is not generic. For example, you can loop on RSS fetched data and truncate the description length, or modify it's contents by using String modules inside the loop. But if you fetch data (as I do in my pipe), the Loop module does not allow to have string a children module inside. While I can see the point for example in input filtering (for example you need a URL Input module to pipe it into the fetch data URL source), while manipulating custom data it should just allow me and if fails then is my problem, but I guess they don't have faith in the mass of pipe users and prefer to harden the framework against errors.
Of course you can also use pipes inside other pipes. I am currently building another pipe that uses the one explained above as the input, then builds a summary of the fetched items (just the first 1k characters, no HTML tags, and things like that), cutting down the final output to just a few KB.
In a world of information overloading this kind of tools can be really interesting. I can now think of a dozen nice use cases, from automatic RSS filtering to data crawling or automated advanced searches...