Data & Services - Collections

Create and manage Solr search collections for full-text indexing

Overview

Collections in ColdFusion are Solr-based full-text search indexes that allow you to index and search documents, database content, and file system data. Collections provide powerful search capabilities including faceted search, highlighting, and relevance ranking.

Collection Operations

Manage Solr collections through the ColdFusion Administrator interface or programmatically.

Create Collection

NameUnique identifier for the collection
PathFile system location where collection data is stored
LanguageLanguage for text analysis and stemming (English, Spanish, French, etc.)
Best Practice: Choose descriptive collection names and organize by content type or application.

Index Collection

Index TypeFile system path, database query, or custom programmatic indexing
ExtensionsFile types to index (PDF, DOC, DOCX, XLS, XLSX, PPT, PPTX, HTML, TXT, etc.)
RecursiveIndex subdirectories recursively for directory-based indexing
Return URLBase URL prepended to file paths in search results
Performance Tip: Schedule large indexing operations during off-peak hours to minimize server impact.

Optimize Collection

Optimizes the Solr index for better search performance by consolidating index segments and removing deleted documents. Run periodically after major index updates or bulk deletions to maintain optimal search speed.

Repair Collection

Repairs corrupted or damaged collection indexes by rebuilding internal index structures. Use when experiencing inconsistent search results, index corruption errors, or after unexpected server shutdowns.

Delete Collection

Permanently removes a collection and all its indexed data. This operation cannot be undone. Back up important collection data before deletion.

Creating Collections Programmatically

Use the cfcollection tag to create and manage collections in your application code:

Creating a New Collection
// Create a new collection
cfcollection(
  action = "create",
  collection = "myDocuments",
  path = expandPath("/collections/myDocuments"),
  language = "english"
);

// Verify collection was created
writeOutput("Collection 'myDocuments' created successfully!");
<!--- Create a new collection --->
<cfcollection
  action="create"
  collection="myDocuments"
  path="#expandPath('/collections/myDocuments')#"
  language="english">

<!--- Verify collection was created --->
<cfoutput>Collection 'myDocuments' created successfully!</cfoutput>

Indexing Strategies

Choose the right indexing approach based on your content source and update frequency requirements.

File System Indexing

Use CaseIndex documents stored on the file system (PDFs, Office docs, HTML, text files)
Supported FormatsPDF, Microsoft Office (DOC, DOCX, XLS, XLSX, PPT, PPTX), HTML, XML, plain text
FeaturesRecursive directory indexing, extension filtering, automatic content extraction
Best Practice: Schedule regular re-indexing to capture updates. Use recursive indexing for directory structures. Monitor file system permissions.

Example: Index File System Directory

Indexing Files from Directory
// Index all PDF and Office documents in a directory
cfindex(
  action = "update",
  collection = "myDocuments",
  type = "path",
  key = expandPath("/documents"),
  extensions = ".pdf,.doc,.docx,.xls,.xlsx",
  recurse = true,
  urlpath = "https://example.com/documents"
);

writeOutput("Directory indexed successfully!");
<!--- Index all PDF and Office documents in a directory --->
<cfindex
  action="update"
  collection="myDocuments"
  type="path"
  key="#expandPath('/documents')#"
  extensions=".pdf,.doc,.docx,.xls,.xlsx"
  recurse="true"
  urlpath="https://example.com/documents">

<cfoutput>Directory indexed successfully!</cfoutput>

Database Indexing

Use CaseIndex content stored in database tables (articles, products, user content)
FeaturesCustom SQL queries, column-to-field mapping, incremental updates, WHERE clause filtering
PerformanceUse incremental indexing with WHERE clauses to update only changed records
Best Practice: Create database indexes on date columns used for incremental updates. Use batching for large datasets. Consider using cfthread for background indexing.

Example: Index Database Content

Indexing Database Records
// Query articles to index
articles = queryExecute("
  SELECT id, title, body, category, dateModified
  FROM articles
  WHERE dateModified > :lastIndexDate
", {
  lastIndexDate: {value: lastIndexDate, cfsqltype: "cf_sql_timestamp"}
});

// Index the query results
cfindex(
  action = "update",
  collection = "articles",
  type = "custom",
  query = articles,
  key = "id",
  title = "title",
  body = "body",
  custom1 = "category",
  urlpath = "https://example.com/article/"
);

writeOutput("#articles.recordCount# articles indexed!");
<!--- Query articles to index --->
<cfquery name="articles">
  SELECT id, title, body, category, dateModified
  FROM articles
  WHERE dateModified > <cfqueryparam value="#lastIndexDate#" cfsqltype="cf_sql_timestamp">
</cfquery>

<!--- Index the query results --->
<cfindex
  action="update"
  collection="articles"
  type="custom"
  query="articles"
  key="id"
  title="title"
  body="body"
  custom1="category"
  urlpath="https://example.com/article/">

<cfoutput>#articles.recordCount# articles indexed!</cfoutput>

Custom Programmatic Indexing

Use CaseIndex dynamic content, aggregated data, or content from external APIs
FeaturesFine-grained control over indexed fields, custom metadata, real-time indexing
ControlUse cfindex tag to add/update/delete individual documents programmatically
Use Cases: Index content at creation time, index aggregated reports, index external API data, real-time search updates.

Example: Custom Document Indexing

Programmatic Document Indexing
// Index a custom document with metadata
cfindex(
  action = "update",
  collection = "products",
  type = "custom",
  key = productID,
  title = productName,
  body = productDescription,
  custom1 = category,
  custom2 = price,
  custom3 = inStock ? "yes" : "no",
  urlpath = "https://example.com/product/#productID#"
);

writeOutput("Product #productID# indexed successfully!");
<!--- Index a custom document with metadata --->
<cfindex
  action="update"
  collection="products"
  type="custom"
  key="#productID#"
  title="#productName#"
  body="#productDescription#"
  custom1="#category#"
  custom2="#price#"
  custom3="#inStock ? 'yes' : 'no'#"
  urlpath="https://example.com/product/#productID#">

<cfoutput>Product #productID# indexed successfully!</cfoutput>

Search Features

Leverage Solr's powerful search capabilities to build rich search experiences.

Full-Text Search

Search across all indexed content with automatic stemming and relevance ranking. Results are scored by relevance and can be sorted by score, date, or custom fields.

Field-Specific Search

Target specific fields in queries using field:value syntax. Example: title:ColdFusion searches only in title fields.

Boolean Operators

Combine search terms with AND, OR, NOT operators for complex queries. Example: ColdFusion AND (tutorial OR guide) NOT beginner

Wildcard Search

Use * and ? wildcards for partial matches. * matches multiple characters, ? matches single character. Example: Cold* matches "ColdFusion", "Cold", etc.

Phrase Search

Exact phrase matching using quotes. Useful for finding exact text sequences. Example: "ColdFusion Administrator" matches the exact phrase.

Proximity Search

Find terms within N words of each other using ~N syntax. Example: "ColdFusion tutorial"~5 finds documents where these words appear within 5 words.

Faceted Search

Category-based filtering with counts. Display facets for categories, tags, dates, etc. Enables drill-down filtering in search interfaces.

Result Highlighting

Automatically highlight search terms in results with context. Shows snippets of text where search terms appear for better relevance display.

Example: Search Collection

Searching and Displaying Results
// Search for documents
results = "";
cfsearch(
  name = "results",
  collection = "myDocuments",
  criteria = "ColdFusion tutorial",
  maxrows = 10,
  startrow = 1
);

// Display search results
writeOutput("<h2>Found #results.recordCount# results</h2>");
for (row in results) {
  writeOutput("
    <div>
      <h3><a href='#row.url#'>#row.title#</a></h3>
      <p>#row.summary#</p>
      <p>Score: #row.score# | Size: #row.size# bytes</p>
    </div>
  ");
}
<!--- Search for documents --->
<cfsearch
  name="results"
  collection="myDocuments"
  criteria="ColdFusion tutorial"
  maxrows="10"
  startrow="1">

<!--- Display search results --->
<cfoutput>
  <h2>Found #results.recordCount# results</h2>
  <cfloop query="results">
    <div>
      <h3><a href="#url#">#title#</a></h3>
      <p>#summary#</p>
      <p>Score: #score# | Size: #size# bytes</p>
    </div>
  </cfloop>
</cfoutput>

Best Practices

Follow these guidelines for optimal collection management and search performance.

Collection Organization

Create separate collections for different content types or applications. Use meaningful collection names that reflect their purpose (e.g., "products", "articles", "documentation").

Regular Index Updates

Schedule regular index updates to keep content fresh. Use incremental indexing for large datasets to minimize resource impact.

Collection Optimization

Optimize collections periodically for better search performance. Run optimization during low-traffic periods to minimize user impact.

Language Analyzers

Use language-specific analyzers for better search quality. Choose the appropriate language when creating collections to enable proper stemming and stop words.

Monitoring

Monitor collection size and performance metrics regularly. Track index size, search response times, and memory usage to detect issues early.

Memory Allocation

Configure appropriate Solr memory allocation based on collection size. Increase Solr heap size for large collections or high search volumes.

Performance Tuning

Optimize Solr collections for maximum search performance and minimal resource usage.

Field Optimization

Limit the number of indexed fields to only what's needed for search. Use stored fields sparingly to reduce index size and improve performance.

Solr Cache Settings

Configure Solr cache settings for frequently accessed data. Tune filter cache, query result cache, and document cache sizes based on usage patterns.

Optimization Scheduling

Run optimization during low-traffic periods to avoid performance degradation. Optimization is I/O intensive and can temporarily impact search performance.

JVM Tuning

Monitor Solr heap usage and adjust JVM settings as needed. Set appropriate -Xmx and -Xms values, and tune garbage collection for large indexes.

Dedicated Server

Consider running Solr on a dedicated server for production environments. Isolating Solr improves reliability and allows independent scaling.

Index Size Management

Keep index sizes manageable by archiving old content. Split very large collections into multiple smaller collections for better performance.

Related Resources