INOVAYT, Inc. creates unique datasets that are of high quality, designed for time series analysis, and ideally suited for investments and trading. Using a combination of machine learning algorithms and human workflow, INOVAYT is able to tackle difficult problems like identifying company aliases. This recipe can be applied to many different kinds of datasets. "Big data" can then be boiled down to its core value and nicely packaged for assimilation into other systems, such as yours.
- US Trademarks: This is extremely high quality United States trademark data — cleaned, normalized, mapped to companies, and designed for time series analysis. You can incorporate this data into your existing model or as a standalone model with proven performance.
- Wikipedia Activity (almost ready for production): Wikipedia page activity as it relates to publicly traded companies
- US Patents (almost ready for evaluation): United States patent appplications as time series data for publicly traded companies
- US CRE (in research stage): United States commercial real estate purchases/leases as time series data for publicly traded companies
- The Planet Dataset (in research stage): Uniquely valuable data gathered by over 100 satellites
Consider this... "Today's alternative data is tomorrow's fundamental data."
Trademarks are actually a great example of this. Even though a given company's trademarks are not on the books, they are undeniably an indication of a company's innovation, marketing, productization, and what is worth protecting. A company's trademarks are much like a fingerprint or a signature. It is understood that there is value in this data; it's just that it has been too difficult to boil down into intelligible data, until now. INOVAYT has it.
INOVAYT has taken the headaches out of dealing with trademark data. The US government does actually provide for most of the raw data. But, as you might expect, it is riddled with errors, inconsistencies, missing values, rule changes, name mismatches, etc. Just to get started, you would have to deal with over 200GB of unwieldy data (that doesn't even include the images). The company name alias/mismatch issue is a notoriously difficult problem. A given company can have literally dozens of aliases that are associated with its trademarks. They can also be difficult to mine; for example, the trademarks for "Toys R Us" were mostly filed under "Geoffrey" (yes, their giraffe mascot). In its raw form, a trademark is actually a log (with a lot of "unnecessary" data). In our trademarks dataset, everything has been boiled down to just the right calculated fields.
Here are some articles that discuss the value of trademarks.
If you need a large sample, such as for backtesting, please contact us (see below).
The US Trademarks for the S&P 100 companies were used for backtesting, covering the last 20 years. Several fairly basic algorithms/methodologies were devised. One of the algorithms (the "trademarks-algorithm") was chosen to report results on; it had the best overall performance, but not by very much (not an outlier). The algorithm is not a result of any machine learning and there is no risk of overfitting.
Using the "trademarks-algorithm":
- The CAGR (compound annual growth rate) was approximately 1.8 times that of the S&P 500 index.
- The risk-adjusted growth rate, using the Sortino Ratio as the measure, was approximately 1.4 times that of the S&P 500 index.
- The CAGR was higher than that of the S&P 500 index in 15 of the 20 years.
If you're ready for more information or you are interested in analyzing some companies that we don't have listed, please just
Email us.