<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title><![CDATA[uKeeper - from link to full page]]></title>
  <link href="http://www.ukeeper.com/atom.xml" rel="self"/>
  <link href="http://www.ukeeper.com/"/>
  <updated>2013-05-19T19:03:04-05:00</updated>
  <id>http://www.ukeeper.com/</id>
  <author>
    <name><![CDATA[Umputun]]></name>
    <email><![CDATA[ukeeper@umputun.com]]></email>
  </author>
  <generator uri="http://octopress.org/">Octopress</generator>

  
  <entry>
    <title type="html"><![CDATA[Version 0.16.2]]></title>
    <link href="http://www.ukeeper.com/news/version-0-dot-16-dot-2/"/>
    <updated>2013-03-29T17:52:00-05:00</updated>
    <id>http://www.ukeeper.com/news/version-0-dot-16-dot-2</id>
    <content type="html"><![CDATA[<ul>
<li>A lot of work on the back-end around RMQ/MongoDB/MultiNAS related support. Such ability helps to build and maintain rapidly and flexibly distributed core of the system.</li>
<li>Added async distributed event bus. This addition improved latency of metadata updates and made everything much more consistent. For example, no more &#8220;eventually consistent user&#8217;s email update&#8221; which caused sometime a few seconds delay until new/updated user&#8217;s info distributed across all uKeeper nodes.</li>
<li>Introduced new modular composition allowing safer and faster release of new functionality.</li>
<li>Added support for Reddit pages</li>
<li>Added support of Surfingbird capturing</li>
<li>Fixed processing of &#8220;base href&#8221; which resolved number of issues with missing images.</li>
<li>Other, minor bugfixes.</li>
</ul>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[version 0.15.3]]></title>
    <link href="http://www.ukeeper.com/news/version-0-dot-15-dot-3/"/>
    <updated>2013-01-31T16:45:00-06:00</updated>
    <id>http://www.ukeeper.com/news/version-0-dot-15-dot-3</id>
    <content type="html"><![CDATA[<ul>
<li>Improved processing of twitter&#8217;s statuses</li>
<li>Fixed processing of twitter&#8217;s pics</li>
<li>Improved image extraction for referrer-sensitive sites</li>
<li>Added support of %i and %p for XMPP (jabber) links</li>
<li>Workaround for the incorrect title returned by external parsers</li>
<li>Improved fault-tolerance in non-S3 (alternative) storage</li>
<li>Added experimental RMQ–based backend</li>
</ul>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[version 0.15.2]]></title>
    <link href="http://www.ukeeper.com/news/version-0-dot-15-dot-2/"/>
    <updated>2013-01-09T17:59:00-06:00</updated>
    <id>http://www.ukeeper.com/news/version-0-dot-15-dot-2</id>
    <content type="html"><![CDATA[<p>This is mostly bug-fixes and internal improvements release.</p>

<ul>
<li>Added ability to change S3 regions</li>
<li>Implemented pessimistic strategy for data retrieval from S3- Added backup shared storage method, eliminating S3 completely in case of need</li>
<li>Improved support of official mobile Twitter client for both &#8220;Mail Tweet&#8221; and &#8220;Mail Link&#8221;</li>
<li>Fixed issue with images referenced by partially invalid urls</li>
<li>AWS SDK updated to the current version</li>
<li>If sites doesn&#8217;t allow image to be downloaded uKeeper will keep the original link</li>
<li>Some minor performance improvements</li>
</ul>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[AWS issues]]></title>
    <link href="http://www.ukeeper.com/news/aws-issues/"/>
    <updated>2013-01-04T14:17:00-06:00</updated>
    <id>http://www.ukeeper.com/news/aws-issues</id>
    <content type="html"><![CDATA[<p>Since yesterday I see exceptionally high ratio of failed or significantly delayed S3 operations. As you may know, uKeeper internals runs on AWS and make use of several services like S3, SQS, DynamoDB and some others. S3 especially important for distributed data retrieval process. Practically, all article-related resources like images and other attachments coming from S3.</p>

<p>S3 works as eventually-consistent storage, which means a simple thing - in some cases the data wrote to S3 not immediately visible to all readers, which is fine if readers expects such behavior and could wait enough time till updated data available. uKeeper one of such smart readers and worked this way since day one. Usually, the waiting period was 100-200ms, in very rare cases - 500ms. As a paranoid developer I put in place a waiting period for up to 15sec, but since Apr 2012 I have seen 4 cases only with unusually high latency (~3sec).</p>

<p>However, since yesterday S3 has been acting differently - relatively large part of submitted objects still not available for hours after the write! I have informed AWS support, and looks like they are trying to fix it. I see number of such incidents decreasing dramatically and for last 12 hours I got just 4 delayed writes.</p>

<p>On uKeeper side this issue initially caused &#8220;request rejected&#8221; to some users. As soon as problem was detected I put in place a hot-fix allows to process articles even if one of resources failed / delayed by S3. In this case user will get an article, but it may have a missing picture. Please note - this is really, really rare case now and hopefully AWS will get it fixed completely very soon.</p>

<p>From this indecent I learned a few important things about &#8220;what to do if AWS acting strange&#8221; and going to implement a new set of backup strategies for cases like this.</p>

<p><em>UPD: 01/05 13:57 CDT – The problem with S3 was resolved completely.</em></p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[version 0.15.1]]></title>
    <link href="http://www.ukeeper.com/news/version-0-dot-15-dot-1/"/>
    <updated>2012-12-31T14:12:00-06:00</updated>
    <id>http://www.ukeeper.com/news/version-0-dot-15-dot-1</id>
    <content type="html"><![CDATA[<ul>
<li>Added ability to catch pages in <em>PDF format</em>. Adding %p to subject will do the trick.</li>
<li>Added ability to catch pages in <em>image format</em> (png) – %i to subject</li>
<li>Added new, regexp-based rules for deeper url match. It allows to tweak parsing for the different parts of the same site if needed.</li>
<li>Added detection and extraction of embedded youtube video</li>
<li>Added support for non-latin URLs</li>
<li>Implemented special parsing for twitter&#8217;s conversations</li>
<li>Added support for twipic images and surrounding text</li>
<li>Fixed incorrect detection of image type in Evernote</li>
<li>Fixed issue with relative image URL&#8217;s without leading /</li>
<li>Fixed issue with urls with &#8220;+&#8221; in web catcher</li>
<li>Improved instapaper &amp; pocket forwarding</li>
<li>Header shows expanded url, not the short one</li>
</ul>


<p><em>both PDF and PNG generators based on <a href="http://code.google.com/p/wkhtmltopdf/">wkhtmltopdf</a></em></p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[version 0.14.6]]></title>
    <link href="http://www.ukeeper.com/news/version-0-dot-14-dot-6/"/>
    <updated>2012-10-31T02:23:00-05:00</updated>
    <id>http://www.ukeeper.com/news/version-0-dot-14-dot-6</id>
    <content type="html"><![CDATA[<ul>
<li>Added link forwarding for instapaper and pocket destinations</li>
<li>Fixed extraction of multiple tags</li>
<li>Improved processing of binary attachments</li>
<li>Preview for cases with multiple URL matches</li>
<li>More space-related URL&#8217;s fixes</li>
<li>Some other improvements, mostly performance related</li>
</ul>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[version 0.14.5]]></title>
    <link href="http://www.ukeeper.com/news/version-0-dot-14-dot-5/"/>
    <updated>2012-10-16T13:24:00-05:00</updated>
    <id>http://www.ukeeper.com/news/version-0-dot-14-dot-5</id>
    <content type="html"><![CDATA[<ul>
<li>Workaround for Evernote&#8217;s issue caused pdf dups.</li>
<li>Added special parser and link converter for google-groups.</li>
<li>Added ability to catch pages/links preprocessed by instapaper or readabilty.</li>
<li>Better support for title meta-tag. In some cases should address duplicated titles and generally makes title extraction smarter and more predictable.</li>
<li>Fixed incorrect tag&#8217;s detection from subj with links (empty body case).</li>
<li>Improved XMPP link detection for adium client.</li>
<li>Fixed parsing of urls with &#8220;+&#8221;</li>
<li>Fixed processing of links with relative path like &#8220;../../&#8221;</li>
<li>Proper cleaning of unicode&#8217;s non-breakable spaces in urls</li>
</ul>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[version 0.14.4]]></title>
    <link href="http://www.ukeeper.com/news/version-0-dot-14-dot-4/"/>
    <updated>2012-09-24T18:21:00-05:00</updated>
    <id>http://www.ukeeper.com/news/version-0-dot-14-dot-4</id>
    <content type="html"><![CDATA[<ul>
<li>Added new jquery selector-syntax for the new custom parser</li>
<li>Added support of custom titles for selectors-based parser</li>
<li>Special support for new, dynamic blogger&#8217;s themes</li>
<li>Improved resiliency for most of S3-based operations</li>
<li>End-to-end fully automatic monitoring systems for regular and full capturing</li>
<li>Fixed processing of twits with image only and with internal redirects&#8217;</li>
<li>Fixed reporting and base processing for links with hash character</li>
<li>Migrated to latest stable Netty 3.5.5</li>
<li>More fixes for problematic redirects</li>
<li>Simplified download tasks manager</li>
<li>Custom parsing settings available for more testers</li>
</ul>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[version 0.14.3]]></title>
    <link href="http://www.ukeeper.com/news/version-0-dot-14-dot-3/"/>
    <updated>2012-09-02T14:41:00-05:00</updated>
    <id>http://www.ukeeper.com/news/version-0-dot-14-dot-3</id>
    <content type="html"><![CDATA[<ul>
<li>Added support of multiple links via jabber (up to 10 links in a single message)</li>
<li>Added ability to disable extra redirects for some sites, forced login in such cases</li>
<li>Proper support for existing custom rules in preview-rules/edit-rules forms</li>
<li>Implemented better concurrency for page preview (custom parser)</li>
<li>Optimized downloading of page resources (images and others embedded elements)</li>
<li>Better support of reply-to filed</li>
<li>Fixed an issue caused usage of email alias instead of correct email in some rare cases</li>
<li>Improved extraction of pages with multiple redirects</li>
<li>Fixed processing of parametrized and dynamic images</li>
</ul>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[version 0.14.2]]></title>
    <link href="http://www.ukeeper.com/news/version-0-dot-14-dot-2/"/>
    <updated>2012-08-14T18:42:00-05:00</updated>
    <id>http://www.ukeeper.com/news/version-0-dot-14-dot-2</id>
    <content type="html"><![CDATA[<p>Internal improvements, mostly around custom parsing.</p>

<ul>
<li>Added ability to change User-Agent for custom sites. For instance it allows to switch (in case of need) to mobile version.</li>
<li>Added generic url-converter for custom url mapping.  This is useful for tricky sites where the article link should be derived.</li>
<li>Excluded element in custom parsers can be attribute value in addition to class name and id.</li>
<li>DKIM was disabled for now, seems to make some sort of incomplete signature</li>
<li>Development migrated to IntelliJ IDEA</li>
</ul>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[version 0.14.1]]></title>
    <link href="http://www.ukeeper.com/news/version-0-dot-14-dot-1/"/>
    <updated>2012-08-07T17:32:00-05:00</updated>
    <id>http://www.ukeeper.com/news/version-0-dot-14-dot-1</id>
    <content type="html"><![CDATA[<ul>
<li>Added optional multi-link support. To turn it on put &#8217;<sup>&#8217;</sup> at the begining of your subject.</li>
<li>Added DKIM (DomainKeys Identified Mail) singature to all ukeeper&#8217;s emails. Should make our emails treated better by anti-spam filters.</li>
<li>Added support of basic authorization for incoming links. It still a <strong>very bad</strong> idea to email your user:password, but if you really have to - ukeeper will process it.</li>
<li>For the most forgetful users - if you didn&#8217;t put any link, either in subject or body, ukeeper will try to do a basic search for you, based on provided subject.</li>
<li>Added support of user-defined parsing rules. Not actived yet for most of users, just for a few selected beta-testers.</li>
<li>Imporved detection and processing/expanding of short urls.</li>
<li>Added unlimited depth for parents in custom parser&#8217;s mathcing.</li>
<li>Added support of aliases for the most popular email services.</li>
<li>Some minor outlook-related fixes.</li>
</ul>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[version 0.12.6]]></title>
    <link href="http://www.ukeeper.com/news/version-0-dot-12-dot-6/"/>
    <updated>2012-07-20T21:34:00-05:00</updated>
    <id>http://www.ukeeper.com/news/version-0-dot-12-dot-6</id>
    <content type="html"><![CDATA[<ul>
<li>improved processing of short links. For some cases such redirect caused encoding lost and prevented custom parsing.</li>
<li>added special extraction for dzone.com links</li>
<li>improved detection of slow pages and implemented timeout auto-adjustment</li>
<li>fixed a rare problem with incorrect title for persistent subject mode</li>
<li>preparation for user-defined parsing parameters with live preview</li>
</ul>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Support changes]]></title>
    <link href="http://www.ukeeper.com/news/support-changes/"/>
    <updated>2012-07-13T19:52:00-05:00</updated>
    <id>http://www.ukeeper.com/news/support-changes</id>
    <content type="html"><![CDATA[<ul>
<li>We have migrated to <a href="http://ukeeper.uservoice.com">uservoice</a> for all type of ticketing, support and sharing new ideas. Feel free to hit us with your bright ideas and sad bug reports.</li>
<li>Report error link will open a simple form where you can add description and your name (optional). This form integrated with our new ticketing system.</li>
<li>Online chat proved to be just a useless toy - removed.</li>
</ul>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[version 0.12.5]]></title>
    <link href="http://www.ukeeper.com/news/version-0-dot-12-dot-5/"/>
    <updated>2012-07-12T21:47:00-05:00</updated>
    <id>http://www.ukeeper.com/news/version-0-dot-12-dot-5</id>
    <content type="html"><![CDATA[<ul>
<li>Added ability to use custom parsers in FULL mode</li>
<li>Improved compatibility with some websites rejected ukeeper&#8217;s image requests</li>
<li>Correct processing for image/resource&#8217;s url with mixed slash and backslash</li>
<li>Custom parser made smarter and can detect first match</li>
<li>Added support of emails with &#8220;+&#8221;</li>
<li>More fixes for empty titles</li>
<li>Fixed web-ukeeper for urls with &#8220;&amp;&#8221;</li>
</ul>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[version 0.12.4]]></title>
    <link href="http://www.ukeeper.com/news/version-0-dot-12-dot-4/"/>
    <updated>2012-07-06T20:13:00-05:00</updated>
    <id>http://www.ukeeper.com/news/version-0-dot-12-dot-4</id>
    <content type="html"><![CDATA[<ul>
<li>improved accuracy of Outlook&#8217;s links processing. In some cases silly Outlook decided to split your link.</li>
<li>support of exotic charsets simplified and unified.</li>
<li>resolved problems with content and subject in different encodings in forced ! mode.</li>
<li>in case if page doesn&#8217;t have any title and title can&#8217;t be extracted from the content uKeeper adds site&#8217;s domain as a title.</li>
<li>resolved problem with non-latin urls, with special characters.</li>
<li>increased timeout for data extraction to allow slow sites and/or big requests to be processed.</li>
<li>fixed processing issue for some links with double-quotes.</li>
<li>added ability to customize article extraction in the new parser on the fly.</li>
</ul>


<!-- more -->


<p>The last item is extremely interesting one. This is continuation of &#8220;very experimental article extractor&#8221; introduced in v0.11 and this extractor allows to adjust uKeeper&#8217;s behavior on micro-level. For example, making
good-looking–in–your–mailbox article from some exotic site, with very unusual layout, can be done in a minute. It also allows to handle known &#8220;problematic&#8221; sites with minimal article/content but with a lot of valuable comments. And one more thing - this custom extractor could be even user-defined some day, and you will able to specify  special behavior (processing) for your favorite site(s).</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Version 0.12.3]]></title>
    <link href="http://www.ukeeper.com/news/version-0-dot-12-dot-3/"/>
    <updated>2012-06-29T17:39:00-05:00</updated>
    <id>http://www.ukeeper.com/news/version-0-dot-12-dot-3</id>
    <content type="html"><![CDATA[<ul>
<li>Fixed a bug prevented processing of short, case-sensitive links, introduced by 0.12.2</li>
<li>Added support for some upper-case links supposed to be lower-case</li>
<li>Added smarter processing for Twitter statuses. uKeeper will try to extract link from the status, if presented.</li>
<li>Added ability to implement custom preprocessors for predefined sites. It will allow all sort of interesting tricks, for example I will able to request for maps sites print-friendly version instead of the regular so-hard-to-get-data-from.</li>
</ul>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Version 0.12.2]]></title>
    <link href="http://www.ukeeper.com/news/version-0-dot-12-dot-2/"/>
    <updated>2012-06-28T14:43:00-05:00</updated>
    <id>http://www.ukeeper.com/news/version-0-dot-12-dot-2</id>
    <content type="html"><![CDATA[<ul>
<li>Added a new way to send links - via jabber. See <a href="http://www.ukeeper.com/extras/">Extras</a> &#8220;uKeeper via Jabber&#8221;. Not fully polished yet, but works.</li>
<li>Fixed problem with &#8220;(&#8221; and &#8220;)&#8221; in links prevented correct processing.</li>
<li>Added ability to capture youtube links.</li>
</ul>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[version 0.12.1]]></title>
    <link href="http://www.ukeeper.com/news/version-0-dot-12-dot-1/"/>
    <updated>2012-06-20T18:59:00-05:00</updated>
    <id>http://www.ukeeper.com/news/version-0-dot-12-dot-1</id>
    <content type="html"><![CDATA[<p>This is sort of significant update with massive improvements in data extraction. Version 0.12.1 added several  different methods of article detection and it combines all of these method to get clean and complete page.</p>

<ul>
<li>Improved accuracy of article&#8217;s detection and extraction.</li>
<li>Added ability to process multi-page articles.</li>
<li>Added support for the new binary formats - doc, xls, ppt, vsd, vst, zip, gz, tgz.</li>
<li>Added ability to adjust extraction rules manually. It should allow almost instant fix for reported problems.</li>
<li>Added multi-step analysis of extracted data with automatic tunning, if necessary.</li>
<li>Smarter detection of article&#8217;s title.</li>
<li>Added initial content extraction from emailed tweets. So far tested just on a few iOS clients. Your feedback will help to make it right.</li>
<li>Fixed an issue with broken formating for some extracted pages.</li>
<li>Fixed an issue with partial articles for some links.</li>
<li>Improved processing of non-article/index pages, like search results and so on.</li>
<li>Fixed procession of links with spaces.</li>
<li>Fixed incorrect support link in ukeeper&#8217;s error email.</li>
<li>Fixed lack of error email in case if response too big and can not be send.</li>
<li>Disabled support for multi-line links until I have some smart way to handle it without affecting pure, single-line links with some content below.</li>
</ul>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[version-0.11.1]]></title>
    <link href="http://www.ukeeper.com/news/version-0-dot-11-dot-1/"/>
    <updated>2012-06-15T16:18:00-05:00</updated>
    <id>http://www.ukeeper.com/news/version-0-dot-11-dot-1</id>
    <content type="html"><![CDATA[<ul>
<li>Added initial version of <strong>very</strong> experimental article extractor. For now you can try it by adding &#8220;%&#8221; at the start of the subj. It&#8217;s buggy and fresh, but has a great potential.</li>
<li>Resolved issue with double-quotes in the encoding info sent by some crazy servers. In rare cases it caused &#8220;reject email&#8221; from uKeeper</li>
<li>Resolved problem with multi-line links. Seems like it was mostly outlook&#8217;s thing.</li>
<li>Optimized downloading and processing of embedded pictures. The same image won&#8217;t be processed multiple times and won&#8217;t be attached multiple times anymore.</li>
<li>In case, if header&#8217;s encoding different from meta info, ukeeper will try to autodetect it based on html, and if this detection match header or meta - this one will be used.</li>
<li>Added automatic code deployment to all worker&#8217;s nodes</li>
<li>Created a template for the new worker&#8217;s node and automation for quick initial build.</li>
</ul>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[version 0.10.2]]></title>
    <link href="http://www.ukeeper.com/news/version-0-dot-10-dot-2/"/>
    <updated>2012-06-10T19:09:00-05:00</updated>
    <id>http://www.ukeeper.com/news/version-0-dot-10-dot-2</id>
    <content type="html"><![CDATA[<ul>
<li>responses migrated to SQS + S3. This allows more reliable delivery and higher redundancy. Also it makes simpler to scale ukeeper out.</li>
<li>added support for partially defined charset and some wrong, but widely used aliases.</li>
<li>improved binary email for images.</li>
<li>added auto-tuning for workers and several protections against excessive load.</li>
<li>added protection against multiple submission of the same request by the same user in short period of time.</li>
<li>fixed an issue with multiple error messages sent out in some rare cases.</li>
<li>message polling made less aggressive and more efficient.</li>
<li>other minor fixes</li>
</ul>

]]></content>
  </entry>
  
</feed>
