Friday, November 21, 2014

Citrix Director 7.6 Deep-Dive Part 7: New Trends Reporting Features [feedly]



----
Citrix Director 7.6 Deep-Dive Part 7: New Trends Reporting Features
// Citrix Blogs

In Director 7.0, we introduced the Trends view which shows historical charts and data to help with troubleshooting, (answer questions about what may have led up to an issue), and to enable EdgeSight performance management (trend data for detailed performance management, capacity management, and SLA monitoring).

----

Shared via my feedly reader


Sent from my iPhone

Attention partners! Learn how to empower your customers for success with Citrix Insight Services at Summit 2015! [feedly]



----
Attention partners! Learn how to empower your customers for success with Citrix Insight Services at Summit 2015!
// Citrix Blogs

The Citrix Summit 2015 Partner Event is fast approaching. If you're a Citrix Partner and have already registered for the event, great – we'll see you there! If you're a Citrix Partner and haven't already registered, I would strongly encourage you to do so by visiting http://www.citrixsummit.com/. Citrix Summit provides attendees with opportunities to immerse themselves in the latest technologies and partner programs while networking…

Read More


----

Shared via my feedly reader


Sent from my iPhone

Database Sizing Tool for XenDesktop 7 [feedly]



----
Database Sizing Tool for XenDesktop 7
// Citrix Blogs

Currently sizing a the databases for XenDesktop relies being able to interpret and understanding the database sizing KB article CTX139508. This doesn't help if you know you have a variation on the listed environments. To try and assist I've created a simple tool that can help generate custom sizing information. Why not an excel file? Many people have asked for excel files or simple formula…

Read More


----

Shared via my feedly reader


Sent from my iPhone

Selling Technology has Changed to Benefit Citrix Partners! [feedly]



----
Selling Technology has Changed to Benefit Citrix Partners!
// Citrix Blogs

Due to the "Seismic Changes" in Selling and Marketing B2B customers are more than halfway through the sales cycle before contacting a vendor.  Because of the convergence of social media, cloud-based marketing solutions, big data, mobility and search engine optimization: 94% of buyers collect and share information prior to a business-to-business purchase decision. (Sirius Decisions) Customers have practically infinite online resources to pull from. Armed with…

Read More


----

Shared via my feedly reader


Sent from my iPhone

The eG Monitor for Citrix ShareFile is Citrix Ready [feedly]



----
The eG Monitor for Citrix ShareFile is Citrix Ready
// Citrix Blogs

eG Monitor for ShareFile We are pleased to announce eG Monitor for Citrix ShareFile as Citrix Ready! eG Innovations provides performance management solutions that dramatically accelerate the discovery, diagnosis, and resolution of service performance issues in virtual, cloud, and physical service infrastructures. eG Innovations' performance management and monitoring solutions are used by the demanding companies to enable delightful user experiences, keep mission-critical business services at…

Read More


----

Shared via my feedly reader


Sent from my iPhone

Multi-Tenancy Redefined With Admin Partitions [feedly]



----
Multi-Tenancy Redefined With Admin Partitions
// Citrix Blogs

The Core Requirement for Datacenter Deployments Has become the ability to run multiple services or instances on same infrastructure. The end customer use case for doing so results into the definition of Multi-tenancy.  It has been a requirement from many years and we were the first ADC vendor to introduce True and Complete Multi-tenant solution with NetScaler SDX platform.   NetScaler SDX defines Multi-tenancy across…

Read More


----

Shared via my feedly reader


Sent from my iPhone

Monitor your Cloud with ScienceLogic [feedly]



----
Monitor your Cloud with ScienceLogic
// Citrix Blogs

What do Old Bay potato chips, the Potomac River and IT monitoring have in common? They all converged at ScienceLogic Symposium in National Harbor, Maryland at the end of October, where I had the opportunity to represent Citrix as one of their technology partners. ScienceLogic is a CitrixReady certified partner that has IT monitoring software that integrates with Citrix CloudPlatform. I learned that more and…

Read More


----

Shared via my feedly reader


Sent from my iPhone

Polaris Office (Android) is Now Worx Verified [feedly]



----
Polaris Office (Android) is Now Worx Verified
// Citrix Blogs

Infraware's Polaris Office Enterprise Worx verified w/ XenMobile 9.04 The mdx package of this is listed on Worx App Gallery here The app is also available for download at Play Store here   ■About POLARIS Office for Citrix! POLARIS Office for Citrix is a Mobile Office application that allows Citrix users to view/edit Microsoft Office (Word, Excel and PowerPoint) documents on Smartphone and Tablet. Users…

Read More


----

Shared via my feedly reader


Sent from my iPhone

Results : XenDesktop 7.x WAN Survey [feedly]



----
Results : XenDesktop 7.x WAN Survey
// Citrix Blogs

Survey Results We wanted to share with you a quick summary of the results from the WAN survey we published a few weeks ago.  We had a very positive response rate to the survey and we'd like to send you a big thank you for that.  We have now analysed the results to plot the types of WAN links customers have to Branch offices.  Some customers…

Read More


----

Shared via my feedly reader


Sent from my iPhone

Thursday, November 20, 2014

Globo chooses CloudStack for its application development and system management platform [feedly]



----
Globo chooses CloudStack for its application development and system management platform
// CloudStack Consultancy & CloudStack...

Globo.com is the Internet arm of GRUPO Globo, the largest media conglomerate in Latin America, with offices in Rio de Janeiro and São Paulo.

It operates the largest vertical web portals in Brazil, focussing on News (G1), Sports (globoesporte.com), Videos (Globo Videos) and Celebrities (Ego).

The company also acts as a service provider for all of the other media businesses in the group , giving strategic and technology support to them for the online elements of their operations. It has excellence on high-volume web distribution and is responsible for the highest simultaneous video streaming audiences in the country.

Business situation

Globo.com wanted to move its application development and systems management practices from a traditional model, based on stand-alone applications deployed on physical servers, into a true cloud-based framework.

As part of this process they needed to replace their existing in-house virtualisation solution (that had been used for the previous three years to manage their development and QA environments) with a modern, full-featured and scalable solution to handle cloud infrastructure in production.

The solution

Globo.com ran an internal selection process to evaluate the solutions on the market and help decide on the best platform for their needs. They started with an assessment of the available products and produced a short list of five candidates for further evaluation, including both proprietary and open source alternatives. These candidates were then invited to install a proof-of-concept solution in Globo.com's premises, working with their technical team.

Each solution was then used as a test bed to setup and deploy two internal applications, in order to get a real-world feeling of the features, performance, stability and ease-of-use for each product. The tests also focused on how to integrate each IaaS solution with Tsuru, Globo.com's own PaaS project. This gave them a better picture of how to reach the proposed goal of building a true cloud platform and not only a simple virtualisation layer on top of their servers.

During these tests, CloudStack demonstrated that it matched more closely the approach Globo.com wanted. The reasons for this included: significantly more case studies published demonstrating that CloudStack is production-ready, offering many network models, agnostic, open standard, has a great and mature community ensuring they will have news features developed by community. As a result Globo.com selected CloudStack, as their new cloud infrastructure solution.

Benefits

Fernando Carolo, Cloud Manager at Globo.com outlined the benefits. "First of all, we found that CloudStack is able to deliver all the functionality we require to manage our cloud infrastructure in a simple, yet comprehensive way.

"After moving to CloudStack, we redirected our internal development efforts away from the maintenance of our in-house IaaS project and towards the integration of other parts of our infrastructure with CloudStack itself. By taking advantage of the built-in extensibility mechanisms provided by CloudStack and the vibrant developer community built around it, we have already made significant progress integrating our new cloud deployment into our current operations, such as the ability to control our internal DNS servers (powered by BIND) through a plug-in under CloudStack. This plug-in has been submitted to the project and accepted for inclusion in the next CloudStack release."

Fernando continued, "Two key points that influenced our decision to start using CloudStack were the ability to extend and adapt the product for our needs and the fact that it is an open source solution. The ability to extend and adapt it is proving to be extremely valuable to our goal of evolving our infrastructure into a full private cloud solution while taking advantage of several existing services that we already have in place, leveraging much of our previous investments in automation and configuration management.

"On the other hand, working on top of an open source solution gives us the confidence to keep evolving our operations without the risk of becoming trapped inside a single-vendor, proprietary solution. CloudStack is at the heart of our move towards a full cloud-based operation and we will not only increase our reliance on it, but also continue to invest in improving and adapting it to our needs."

Globo.com is advised by ShapeBlue, the globally leading CloudStack integrator and services company.

 


----

Shared via my feedly reader




Sent from my iPad

Citrix, FlexPod and the Shift to Cloud: the Tipping Point? [feedly]



----
Citrix, FlexPod and the Shift to Cloud: the Tipping Point?
// Citrix Blogs

After all these years why have so many organizations lagged? Why have they hesitated to make the transition from legacy data centers to private or public cloud infrastructure?  Despite all of the new cloud enhancements, not to mention innovations in the delivery of personal desktops and workspaces, enterprises continue to take a cautious approach. Large enterprise CIOs have earned a reputation for a conservative approach,…

Read More


----

Shared via my feedly reader




Sent from my iPad

Wednesday, November 19, 2014

AWS re:Invent Highlights Enterprise and DevOps as Key Amazon Priorities [feedly]



----
AWS re:Invent Highlights Enterprise and DevOps as Key Amazon Priorities
// New Relic blog

If there's one thing that Amazon Web Services (AWS) is good at, it's moving fast. As AWS senior vice president Andy Jassy explained at the AWS re:Invent keynote session last Wednesday in Las Vegas (watch the video here), the company is well on its way toward releasing more than 500 new services and features for its customers in 2014. At the re:Invent conference alone, AWS announced close to a dozen new offerings, all of which were met with plenty of applause from the conference's more than 13,000 attendees.

For the New Relic team that attended re:Invent, it was an especially exciting couple of days; not only for the opportunity it gave us to catch up with some of our joint customers, but also because it gave us insight into Amazon's key priorities and how New Relic's software analytics offerings can help complement them.

Helping enterprises achieve business value

Enterprise cloud adoption was a key theme at re:Invent. And to address the needs of enterprises with greater security and compliance concerns, AWS announced the following new services:

  • AWS Key Management Service: A managed service that makes it easy to create and control the encryption keys used to encrypt and protect data.
  • AWS Config: Another security-focused managed service that provides users with resource inventory, configuration history, and configuration change notifications.
  • AWS Service Catalog: A service that allows IT departments to create customized catalogs of resources for end users to access, helping ensure compliance with business policies.

Of course, security and compliance are only a part of the bigger picture. Whether you're only starting to migrate or have already migrated to the cloud, enterprises running business-critical applications need end-to-end monitoring to ensure that it's delivering the best possible experience to its users. (Case in point: New Relic and AWS customer Condé Nast).

Amazon Aurora was another major product announcement that was targeted toward the enterprise audience. A MySQL-based database service that promises more than five times the performance speed at a tenth of the cost of traditional databases, Amazon Aurora is further evidence of AWS's focus on data warehousing and database services for the enterprise. And seeing that many New Relic customers use MySQL, Amazon Aurora appears to be a technology worthy of support via New Relic Platform.

Speeding up software development

DevOps also emerged as a recurring theme at re:Invent. "Agility is the holy grail," said Amazon CTO Werner at the Thursday morning keynote (watch the video here). In addition to the well-received announcement of the Docker-friendly Amazon EC2 Container Service, AWS introduced a number of new developer-focused tools:

  • AWS CodeDeploy: A service that automates code deployments to Amazon EC2 instances.
  • AWS CodePipeline: A continuous delivery and release automation service that lets you design your development workflow for checking in code, building the code, and deploying your app into staging, testing, and production.
  • AWS CodeCommit: A managed control service that hosts private Git repositories.

From development, build and test, to deployment, these new products are aligned to each stage of the software development lifecycle. And like New Relic tools, they're designed to make life easier for developers and DevOps teams.

Going "All-In" with AWS—and New Relic

Throughout many re:Invent sessions and talks, ran an overarching narrative of going "all-in" with AWS. As companies successfully move their dev and test applications to the cloud, more and more are then migrating production and mission-critical apps as well, eventually leading to the all-in mindset.

If your company is also planning to go "all-in" with the cloud, New Relic is happy to help migrate and improve the performance of your applications in the process. We've joined forces with managed services providers like 2nd Watch to optimize AWS environments. And don't forget that you can also use New Relic's AWS plugins to help bring your cloud data points into performance-related context. To learn more about using New Relic with AWS, visit: www.newrelic.com/aws.

Reinvent-pic-2


----

Shared via my feedly reader




Sent from my iPad

Welcome to GitHub, .NET! [feedly]



----
Welcome to GitHub, .NET!
// The GitHub Blog

Microsoft announced at their Connect event last week that they will be open sourcing much of the .NET technology stack, as well as moving development of these technologies over to GitHub.

And within hours of the announcement contributions were already being accepted!

You can browse and search the projects that Microsoft has made public on GitHub over on their landing page.

This isn't the first team from Microsoft to join us on GitHub - the Windows Azure, Microsoft Open Tech, TypeScript and ASP.NET teams are already on GitHub, collaborating in the open with the community.

If you're one of the 6 million developers building applications using .NET, this is your chance to contribute to the future direction of your development stack. Check out the GitHub help site or GitHub Guides to learn more about contributing to open source.


----

Shared via my feedly reader




Sent from my iPad

DevOps Newsletter 203 [feedly]





Sent from my iPad

CVE-2014-8090: Another Denial of Service XML Expansion [feedly]



----
CVE-2014-8090: Another Denial of Service XML Expansion
// Ruby News

Unrestricted entity expansion can lead to a DoS vulnerability in REXML, like "Entity expansion DoS vulnerability in REXML (XML bomb, CVE-2013-1821)" and "CVE-2014-8080: Parameter Entity expansion DoS vulnerability in REXML". This vulnerability has been assigned the CVE identifier CVE-2014-8090. We strongly recommend to upgrade Ruby.

Details

This is an additional fix for CVE-2013-1821 and CVE-2014-8080. The previous patches fixed recursive expansions in a number of places and the total size of created Strings. However, they did not take into account the former limit used for entity expansion. 100% CPU utilization can occur as a result of recursive expansion with an empty String. When reading text nodes from an XML document, the REXML parser can be coerced into allocating extremely large string objects which can consume all of the memory on a machine, causing a denial of service.

Impacted code will look something like this:

require 'rexml/document'    xml = <<XML  <!DOCTYPE root [   # ENTITY expansion vector  ]>  <cd></cd>  XML    p REXML::Document.new(xml)  

All users running an affected release should either upgrade or use one of the workarounds immediately.

Affected versions

  • All Ruby 1.9 versions prior to Ruby 1.9.3 patchlevel 551
  • All Ruby 2.0 versions prior to Ruby 2.0.0 patchlevel 598
  • All Ruby 2.1 versions prior to Ruby 2.1.5
  • prior to trunk revision 48402

Workarounds

If you cannot upgrade Ruby, use this monkey patch as a workaround:

class REXML::Document    def document      self    end  end  

Credits

Thanks to Tomas Hoger for reporting this issue.

History

  • Originally published at 2014-11-13 12:00:00 UTC

Posted by usa on 13 Nov 2014


----

Shared via my feedly reader




Sent from my iPad

Ruby 2.1.5 Released [feedly]



----
Ruby 2.1.5 Released
// Ruby News

Ruby 2.1.5 has been released.

This release includes a security fix for a DoS vulnerability of REXML. It is similar to the fixed vulnerability in the previous release, but new and different from it.

And, some bug fixes are also included. See tickets and ChangeLog for details.

Download

  • http://cache.ruby-lang.org/pub/ruby/2.1/ruby-2.1.5.tar.bz2

    SIZE:   11994454 bytes  MD5:    a7c3e5fec47eff23091b566e9e1dac1b  SHA256: 0241b40f1c731cb177994a50b854fb7f18d4ad04dcefc18acc60af73046fb0a9  SHA512: d4b1e3c2b6a0dc79846cce056043c48a2a2a97599c76e9a07af21a77fd10e04c8a34f3a60b6975181bff17b2c452af874fa073ad029549f3203e59095ab70196  
  • http://cache.ruby-lang.org/pub/ruby/2.1/ruby-2.1.5.tar.gz

    SIZE:   15127433 bytes  MD5:    df4c1b23f624a50513c7a78cb51a13dc  SHA256: 4305cc6ceb094df55210d83548dcbeb5117d74eea25196a9b14fa268d354b100  SHA512: a7da8dc755e5c013f42269d5e376906947239b41ece189294d4355494a0225590ca73b85261ddd60292934a8c432231c2308ecfa137ed9e347e68a2c1fc866c8  
  • http://cache.ruby-lang.org/pub/ruby/2.1/ruby-2.1.5.tar.xz

    SIZE:   9371780 bytes  MD5:    8a30ed4b022a24acbb461976c9c70789  SHA256: 22ba1eb8d475c9ed7e0541418d86044c1ea4c093ab79c300c38fc0f721afe9a3  SHA512: 8a257da64158d49bc2810695baf4b5849ef83e3dde452bf1e4823e52e8261225427d729fce2fb4e9b53d6d17ca9c96d491f242535c2f963738b74f90944e2a0b  
  • http://cache.ruby-lang.org/pub/ruby/2.1/ruby-2.1.5.zip

    SIZE:   16657694 bytes  MD5:    810cd05eb03c00f89b0b03b10e9a3606  SHA256: 69c517a6d3ea65264455a9316719ffdec49cf6a613a24fd89b3f6da7146a8aa7  SHA512: a55cf5970203904e7bc8cef2b6fbf7b8d5067a160289a1a49d13c4dfef8c95002bcdf697f5d04d420ef663efad5ee80d5a9e4e7445c4db9a02f9cbc9e4b8444e  

Release Comment

Sorry for the inconvenience of frequent releases. Thanks to everyone who gave the cooperation to release.

Posted by nagachika on 13 Nov 2014


----

Shared via my feedly reader




Sent from my iPad

Personalized Recommendations at Etsy [feedly]



----
Personalized Recommendations at Etsy
// Code as Craft

Providing personalized recommendations is important to our online marketplace.  It benefits both buyers and sellers: buyers are shown interesting products that they might not have found on their own, and products get more exposure beyond the seller's own marketing efforts.  In this post we review some of the methods we use for making recommendations at Etsy.  The MapReduce implementations of all these methods are now included in our open-source machine learning package "Conjecture" which was described in a previous post.

Computing recommendations basically consists of two stages.  In the first stage we build a model of users' interests based on a matrix of historic data, for example, their past purchases or their favorite listings (those unfamiliar with matrices and linear algebra see e.g., this  review).  The models provide vector representations of users and items, and their inner products give an estimate of the level of interest a user will have in the item (higher values denote a greater degree of estimated interest).  In the second stage, we compute recommendations by finding a set of items for each user which approximately maximizes the estimate of the interest.

The model of users and items can be also used in other ways, such as finding users with similar interests, items which are similar from a "taste" perspective, items which complement each other and could be purchased together, etc.

Matrix Factorization

The first stage in producing recommendations is to fit a model of users and items to the data.  At Etsy, we deal with "implicit feedback" data where we observe only the indicators of users' interactions with items (e.g., favorites or purchases).  This is in contrast to "explicit feedback" where users give ratings (e.g. 3 of 5 stars) to items they've experienced. We represent this implicit feedback data as a binary matrix, the elements are ones in the case where the user liked the item (i.e., favorited it) or a zero if they did not.  The zeros do not necessarily indicate that the user is not interested in that item, but only that they have not expressed an interest so far.  This may be due to disinterest or indifference, or due to the user not having seen that item yet while browsing.

 

 

An implicit feedback dataset in which a set of users have "favorited" various items, note that we do not observe explicit dislikes, but only the presence or absence of favorites

 

The underpinning assumption that matrix factorization models make is that the affinity between a user and an item is explained by a low-dimensional linear model.  This means that each item and user really corresponds to an unobserved real vector of some small dimension.  The coordinates of the space correspond to latent features of the items (these could be things like: whether the item is clothing, whether it has chevrons, whether the background of the picture is brown etc.), the elements for the user vector describe the users preferences for these features.  We may stack these vectors into matrices, one for users and one for items, then the observed data is in theory generated by taking the product of these two unknown matrices and adding noise:

 

The underpinning low dimensional model from which the observed implicit feedback data is generated, "d" is the dimension of the model.

 

We therefore find a vector representation for each user and each item.  We compute these vectors so that the inner product between a user vector and item vector will approximate the observed value in the implicit feedback matrix (i.e., it will be close to one in the case the user favorited that item and close to zero if they didn't).

 

The results of fitting a two dimensional model to the above dataset, in this small example the the first discovered features roughly corresponds to whether the item is a shelf or not, and the second to whether it is in a "geometric" style.

 

Since the zeros in the matrix do not necessarily indicate disinterest in the item, we don't want to force the model to fit to them, since the user may actually be interested in some of those items.  Therefore we find the decomposition which minimizes a weighted error function, where the weights for nonzero entries in the data matrix are higher than those of the zero entries.  This follows a paper which suggested this method.  How to set these weights depends on how sparse the matrix is, and could be found through some form of cross validation.

What happens when we optimize the weighted loss function described above, is that the reconstructed matrix (the product of the two factors) will often have positive elements where the input matrix has zeros, since we don't force the model to fit to these as well as to the non-zeros.  These are the items which the user may be interested in but has not seen yet.  The reason this happens is that in order for the model to fit well, users who have shown interest in overlapping sets of items will have similar vectors, and likewise for items.  Therefore the unexplored items which are liked by other users with similar interests will often have a high value in the reconstructed matrix.

Alternating Least Squares

To optimize the model, we alternate between computing item matrix and user matrix, and at each stage we minimize the weighted squared error, holding the other matrix fixed (hence the name "alternating least squares").  At each stage, we can compute the exact minimizer of the weighted square error, since an analytic solution is available.  This means that each iteration is guaranteed not to increase the total error, and to decrease it unless the two matrices already constitute a local minimum of the error function.  Therefore the entire procedure gradually decreases the error until a local minimum is reached.  The quality of these minima can vary, so it may be a reasonable idea to repeat the procedure and select the best one, although we do not do this.  A demo of this method in R is available here.

This computation lends itself very naturally to implementation in MapReduce, since e.g., when updating a vector for a user, all that is needed are the vectors for the items which he has interacted with, and the small square matrix formed by multiplying the items matrix by its own transpose.  This way the computation for each user typically can be done even with limited amounts of memory available, and each user may be updated in parallel.  Likewise for updating items.  There are some users which favorite huge numbers of items and likewise items favorited by many users, and those computations require more memory.  In these cases we can sub-sample the input matrix, either by filtering out these items, or taking only the most recent favorites for each user.

After we are satisfied with the model, we can continue to update it as we observe more information, by repeating a few steps of the alternating least squares every night, as more items, users, and favorites come online.  New items and users can be folded into the model easily, so long as there are sufficiently many interactions between them and existing users and items in the model respectively.  Productionizable MapReduce code for this method is available here.

Stochastic SVD

The alternating least squares described above gives us an easy way to factorize the matrix of user preferences in MapReduce. However, this technique has the disadvantage of requiring several iterations, sometimes taking a long time to converge to a quality solution. An attractive alternative is the Stochastic SVD.  This is a recent method which approximates the well-known Singular Value Decomposition of a large matrix, and which admits a non iterative MapReduce implementation.  We implement this as a function which can be called from any scalding Hadoop MapReduce job.

A fundamental result in linear algebra is that the matrix formed by truncating the singular value decomposition after some number of dimensions is the best approximation to that matrix (in terms of square error) among all matrices of that rank.  However we note that using this method we cannot do the same "weighting" to the error as we did when optimizing via alternating least squares.  Nevertheless for datasets where the zeros do not completely overwhelm the non-zeros then this method is viable.  For example we use it to build a model from the favorites, whereas it fails to provide a useful model from purchases which are much more sparse, and where this weighting is necessary.

An advantage of this method is that it produces matrices with a nice orthonormal structure, which makes it easy to construct the vectors for new users on the fly (outside of a nightly recomputation of the whole model), since no matrix inversions are required.  We also use this method to produce vector representations of other lists of items besides those a user favorited, for example treasuries and other user curated lists on Etsy.  This way we may suggest other relevant items for those lists.

Producing Recommendations

Once we have a model of users and items we use it to build product recommendations.  This is a step which seems to be mostly overlooked in the research literature.  For example, we cannot hope to compute the product of the user and item matrices, and then find the best unexplored items for each user, since this requires time proportional to the product of the number of items and the number of users, both of which are in the hundreds of millions.

One research paper suggests using a tree data structure to allow for a non-exhaustive search of the space, by pruning away entire sets of items where the inner products would be too small.  However we observed this method to not work well in practise, possibly due to the curse of dimensionality with the type of models we were using (with hundreds of dimensions).

Therefore we use approximate methods to compute the recommendations.  The idea is to first produce a candidate set of items, then to rank them according to the inner products, and take the highest ones.  There are a few ways to produce candidates, for example, the listings from favorite shops of a user, or those textually similar to his existing favorites.  However the main way we use is "locality sensitive hashing" (LSH) where we divide the space of user and item vectors into several hash bins, then take the set of items which are mapped to the same bin as each user.

Locality Sensitive Hashing

Locality sensitive hashing is a technique used to find approximate nearest neighbors in large datasets.  There are several variants, but we focus on one designed to handle real-valued data and to approximate the nearest neighbors in the Euclidean distance.

The idea of the method is to partition the space into a set of hash buckets, so that points which are near to each other in space are likely to fall into the same bucket.  The way we do this is by constructing some number "p" of planes in the space so that they all pass through the origin.  This divides the space up into 2^p convex cones, each of which constitutes a hash bucket.

Practically we implement this by representing the planes in terms of their normal vectors.  The side of the plane that a point falls on is then determined by the sign of the inner product between the point and the normal vector (if the planes are random then we have non-zero inner products almost surely, however we could in principle assign those points arbitrarily to one side or the other).  To generate these normal vectors we just need directions uniformly at random in space.  It is well known that draws from an isotropic Gaussian distribution have this property.

We number the hash buckets so that the i^th bit of the hash-code is 1 if the inner product between a point and the i^th plane is positive, and 0 otherwise.  This means that each plane is responsible for a bit of the hash code.

After we map each point to its respective hash bucket, we can compute approximate nearest neighbors, or equivalently, approximate recommendations, by examining only the vectors in the bucket.  On average the number in each bucket will be 2^{-p} times the total number of points, so using more planes makes the procedure very efficient.  However it also reduces the accuracy of the approximation, since it reduces the chance that nearby points to any target point will be in the same bucket.  Therefore to achieve a good tradeoff between efficiency and quality, we repeat the hashing procedure multiple times, and then combine the outputs.  Finally, to add more control to the computational demands of the procedure, we throw away all the hash bins which are too large to allow efficient computation of the nearest neighbors.  This is implemented in Conjecture here.

Other Thoughts

Above are the basic techniques for generating personalized recommendations.  Over the course of developing these recommender systems, we found a few modifications we could make to improve the quality of the recommendations.

  • Normalizing the vectors before computation: As stated, the matrix factorization models tend to produce vectors with large norms for the popular items.  A result is that some popular items may get recommended to many users even if they are not necessarily the most aligned with the users tastes.  Therefore, before computing recommendations, we normalized all the item vectors.  This also makes the use of approximate nearest neighbors theoretically sound: since when all vectors have unit norms, maximum inner products to a user vector are achieved by the nearest item vectors.
  • Shop diversity: Etsy is a marketplace consisting of many sellers. So as to be fair to these sellers, we limit the number of recommendations from a single shop that we present to each user.  Since users may click through to the shop anyway, exposing additional items  available from this shop, this does not seem to present a problem in terms of recommendation quality.
  • Item diversity: To make the recommendations more diverse, we take a candidate set of say 100 nearest neighbors to the user, then we filter those out by removing any item which is within some small distance of a higher ranked item, where the distance is measured in the Euclidean distance between the item vectors.
  • Reranked Items from Similar Users: We used the LSH code to find nearest neighbors among the users (so for each user we find users with similar tastes).  Then to produce item recommendations we can take those users favorite listings, and re-rank them according to the inner products between the item vectors and the target users vector.  This lead to seemingly better and more relevant recommendations, although a proper experiment remains to be done.

Conclusion

In summary we described how we can build recommender systems for e-commerce based on implicit feedback data.  We built a system which computes recommendations on Hadoop, which is now part of our open source machine learning package "Conjecture."  Finally we shared some additional tweaks that can be made to potentially improve the quality of recommendations.


----

Shared via my feedly reader




Sent from my iPad

Apache Kafka [feedly]



----
Apache Kafka
// Food Fight

Watch Now

Panel

Outline

Picks

Brandon

Jay

Download


The Food Fight Show is brought to you by Bryan Berry and Nathen Harvey with help from other hosts and the awesome community of Chefs.

The show is sponsored, in part, by Chef.

Feedback, suggestions, and questions: info@foodfightshow.com or http://github.com/foodfight/showz.


----

Shared via my feedly reader




Sent from my iPad

Canon Achieves Datacenter Automation and Increases Virtualization by Over 60% [feedly]



----
Canon Achieves Datacenter Automation and Increases Virtualization by Over 60%
// Virtualization Management Software & Data Center Control | VMTurbo » VMTurbo Blog

We've said it before and we'll say it again, datacenter automation is the way of the future. That being said, we understand that it is a daunting step to take. One of our brave customers, and a leading provider of … READ MORE

The post Canon Achieves Datacenter Automation and Increases Virtualization by Over 60% appeared first on Virtualization Management Software & Data Center Control | VMTurbo.


----

Shared via my feedly reader




Sent from my iPad

Flaky Tests and Monkeys: What I learned at GTAC 2014 [feedly]



----
Flaky Tests and Monkeys: What I learned at GTAC 2014
// Puppet Labs

The Google Test Automation Conference was held in the Kirkland, WA Google campus over two days last month. This is the first conference I've attended that is dedicated to what I do for a living - developing automated tests and testing infrastructure.


----

Shared via my feedly reader




Sent from my iPad