David Carasso david at carasso.com


Objective: Principal Scientist, Chief Scientist, Technical Marketing – related to intelligent internet applications, in the SF Bay Area (SF, East Bay, Marin).

Software Skills: 18 years of commercial software development, 7 years of Java, 11 years of C++, 2 years of python, and 12 years of web development. MacOSX/WindowsXP/Linux/Unix.

Research Skills: Information Retrieval, Natural Language Processing, Information Extraction, Automatic Email Response, Question-Answering Systems, Workflow systems, Case-Based Reasoning, Artificial Intelligence, Search Engine Categorization, Automatic Ratings Systems, and Automatic Lexicon/Directory Creation.

Marketing Skills: Initiated, designed, documented, created, and managed, many successful software products; built web sites; created and delivered PowerPoint presentations to hundreds of people; promoted company products at trade shows, in print, and in radio interviews; and created company sales model for web products.


Splunk, Inc. (Feb 2005 – Current) Principal Scientist / Chief Mind

Responsible for innovating and prototyping a class of "hard problems" -- including data mining, dynamic tagging of events, automatic field extraction, the search language framework, file classification, event aggregation, automatic vocabulary building, and machine learning. Prototyping in Python and deploying in C++.

Designed and built a system for anonymizing log file data.

Filed two provisional patents -- the first for generating a hyperlinked web of high-level concepts and causality from a set of log files, and the second for automatically anonymizing log files so that sensitive logs can be shared with outside organizations.

NexTag, Inc. (Aug 2004 – Feb 2005) Principal Scientist

Responsible for research and development of new scoring algorithms to improve customer click-through, used by millions of users for shopping comparison. Implemented major scoring change that took into account historical time-weighted clicks per impression function, as well as considering the positional rank of the clicked search result. Developed in Java and Oracle SQL.

Designed and built rule system and web interface to allow query correction in a convenient user friendly interface.

InQuira, Inc. (May 2002 – Jul 2004) Architect

Responsible for research and development of key parts of the InQuira corporate search engine, used at AT&T, GE, Bank of America, Honda, Yahoo, Fidelity, BEA, and other sites.

Researched and built a system to automatically generate a structured WordNet-like Taxonomy from a set of unstructured corporate documents. The system relied on statistical analysis, WordSpace, phrase lattices, template patterns, and a static taxonomy to bootstrap itself.

Researched and built a Google knock-off that crawls and searches corporate sites, returning results similar to Google. The algorithm used document Page Rank as well as term frequency.

Designed and built these additional modules: workflow, xml rules engine storage, cvs integration, business rules engine, configuration administration, variable instantiation, page rank scoring, document recency, and results display.

BlackPearl Software (March 2001 – April 2002) Artificial Intelligence Architect

Responsible for research and development of all AI technology. Researched and built prototypes and modules for flagship product, including 1) a deployed system at the world’s largest bank to present users with only useful news articles, 2) an information extraction module to ‘read’ corporate press releases and generate ‘facts’ for the main product’s rule-based system, 3) a Case-Based Reasoning Search module with automatic question answering, 4) a Natural Language Processing module to improve querying, and 5) the next generation Ontology, tying rules, cases, and concepts together.

Made numerous technical sales presentations to large financial services corporations. Developed Proof-Of-Concepts for prospects – writing customer knowledgebases, casebases, and databases – and customizing servlets.

Innovato.Com (December 2000 – February 2001) Founder and Developer

Started company, built web site, and developed software. Software applications developed include:

Automatic Email Response System that automatically answers corporate email, responding to plain English, using advanced linguistic technology, syntactic and semantic knowledge, fuzzy matching, and tolerance for misspelling.  The system responds better with each customer interaction.

Personal Website Agent that responds to natural language queries, with facial expressions, and automatically navigates the user around the corporate web site for answers.

Automatic Marketing Polling Agent that scans the internet, summarizing positive and negative user impressions of a product, person, or place. Automatically determines effectiveness of marketing campaigns on customer attitudes, from day to day.

Ask Jeeves Corporation [NASDAQ: ASKJ] (April 2000 -- December 2000) Senior Scientist, Advanced Development Group

Hired by CEO and CTO to work alone and build the Next Generation of Ask Jeeves. Initiated, researched, designed, and built:

Automatic Question Answerer: Advanced Java Web Application that automatically answers user questions by "reading" unstructured text from the web and extracting answers. For example, queries such as, “Who is the CEO of IBM?” “When was the microwave invented?”, or “Where is UCLA?” would return exact answers, rather than long lists of documents to read. This product was to replace many of the knowledge engineers that made Ask Jeeves unscalable.

Parser: Statistical Natural Language Parser to handle the exceedingly noisy “English” of web queries.

Query Analyzer: Automatically finds spelling and stemming variations (e.g. nucular, nuclear, nuculear); finds new abbreviations (e.g., smeg, tlc, cvcc, wwf) and definitions; finds question variations by clustering similar queries (e.g., "what animals live in the rain forest", "what kind of animals live in australia"),

Authoring Assistant: Generated new dictionary terms, from seed terms and unstructured text on the web, automatically learning rules to find hints at structure.  For example, given “Homer, Bart, and Maggie” it would discover “Lisa and Marge”.

Information Retrieval: Wrote an IR engine to spider and index corporate web sites, generating internal term relationships and XML needed by Ask Jeeves to import into a knowledge base. This was part of an initiative to greatly speed up deployment at corporate sites.

Inference Corporation [NASDAQ: INFR, EGAN] (July 1989 -- March 2000) Chief Scientist

Initiated, designed, and co-wrote the products that brought in the vast bulk of Inference's revenue. [Details on web]

Managed 11 software engineers, working simultaneously on over 5 products, delivering all products on time. [Details on web]

Researched and built prototypes in Java and C++ relating to Information Retrieval, Information Extraction, Lexicon Creation, Query Expansion, Text Summarization, Subject Breaking, Spelling Correction, Natural Language Processing, Search Engine Categorization, Question-Answering Systems, and Case-Based Reasoning.

Entirely initiated, designed, and wrote iFind (later renamed Inference Find or InFind) in 1995, the first meta-search on the web. InFind retrieves search results from multiple search engines and dynamically clusters similar results together. John C. Dvorak, of PC Magazine and Ziff-Davis TV, named InFind the best internet search engine. [Details on web]

Entirely initiated, designed, and wrote CasePoint WebServer, the first case-based reasoning application on the web, allowing fast fuzzy queries and automatic query refinement via question answering. Created sales model for Inference's web products.

Initiated, designed, and co-wrote the CasePoint product, a small, fast Case-based Reasoning application, which enabled companies to have their customers solve their own customer-service problems. Compaq, for example, shipped CasePoint with a casebase of common computer problems to millions of customers.

Co-designed, wrote, and tested a joint multi-million dollar research project with Microsoft Corporation for NT 4.0. The project allowed intelligent finding and browsing of information on a file system. Statistically relevant terms were clustered and used to generate questions on-the-fly for query refinement that allowed users to narrow or expand their search.

Ported the CBR Kernel to work with Japanese 2-byte Kanji characters for Japanese distribution of Art*Enterprise, CBR Express, and CasePoint, by Nichimen Corporation.

Responsible for Art*Enterprise's Case-Based Reasoning (CBR) Kernel, including writing, testing, documenting, and maintaining the CBR Kernel. Researched learning algorithms, data mining, noise elimination, and text preprocessing.

Zurf, Inc., (subsidiary of Inference) (1996-1998) Chief Technology Officer.

Created a new Inference subsidiary to sell consumer products.

Initiated, designed, and co-wrote ZurfRider, a PC meta-internet search tool, highly rated by dozens of the PC magazines and newspapers. [Details on web]

ZurfRider called 70 other search engines, dynamically clustering similar results and automatically generating dynamic refinement questions to narrow and expand the search. Built company web site, including on-line credit card sales, upgrades, and technical support. Promoted company's product at trade shows and in print and radio interviews.  Managed three software engineers.

Education:

University of Southern California, M.S. Computer Science, 1991.
University of California, Berkeley, B.A. Computer Science, 1989.